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(54) riUe: PROTEIN EXPRESSION SYSTEM 
(57) Abstract 

The present invention relates to improved recom- 



binant vectors which allow for the production of fusion 
proteins. The present invention also relates to methods 
for the expression and purification of authentic recombi- 
nant proteins from such fusion proteins. In particular, the 
present invention relates to fusion proteins wherein ad- 
ditional domains and/or elements arc added to the fusion 
proteins. Included in these domains and/or elements are Fc 
fragments (1 ) fused to proteins of interest (2) by a polypep- 
tide comprising a hinge region (3), hydrophilic spacer (4), 
and a dibasic amino acid endoprotease cleavage site (5), 
wherein the spacer may be cleaved and then digested by 
carboxypcplida.se B (6) to yield the authentic protein (2). 
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PCT/US97/01470 

PROTEIN EXPRESSION SYSTEM 



FIELD OF THE INVENTION 

The present invention relates to improved recombinant vectors which allow for the 
5 production of fusion proteins and methods for the expression and purification of authentic 
recombinant protems from these fusion proteins. 

BACKGROUND OF THE INVENTION 

The ability to isolate large quantities of recombinant proteins purified to homogeneity 
10 is particularly important to the pharmaceutical industry. Recombinant proteins produced for 
therapeutic applications must be free of antigens and toxins found in the host cell used for 
protein production. Additionally, the recombinant protein should represent an authentic 
version of the naturally occurring protein, i.e., a protein having the same primary amino acid 
sequence as found in the naturally occurring protein. 
15 Proteins encoded by recombinant DNA clones may be expressed and purified using a 

variety of methods. Recombinant proteins may be expressed in prokaryotic hosts or in 
eukaryotic hosts, such as yeast or mammalian cell lines. Prokaryotic hosts are more widely 
used for the expression of recombinant proteins. Prokaryotes, such as Escherichia coli (£. 
coli), are well characterized, easy to manipulate and grow in inexpensive media. Expression 
20 of recombinant proteins in eukaryotic hosts is attractive particularly when the protein must 
contain post-translational modifications which do not occur in prokaryotic hosts. 

Expression Of Recombinant Proteins Id Prokaryotic Hosts 

£. coli is the most widely used host for the expression of recombinant proteins. Early 
25 attempts to express foreign proteins in £. coli were unsuccessful due in part to rapid 

proteolytic degradation of the foreign protein. Methods of recombinant protein expression 
that use a prokaryote as the host cell often employ a technique that expresses a foreign 
polypeptide fused with a bacterial protein. This is done to stabilize the foreign protein in the 
host cell line. 

30 Early attempts at the expression of foreign proteins in E. coli utilized the bacterial p- 

galactosidase (P-gal ) protein as the fusion partner. Many of the P-gal fusion proteins were 

insoluble anri uere foiinci m inclusion horiir*; H-akura K rf al yupra- Yo\ine. R /\ and 
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EAfBO 1 31:429 (1984)1. 5" some cases, active fusion protein was recovered from the 
inclusion bodies by solubilization with denaturing reagents [Martson, A.O, Biochem J, 240:1 
(1986)]. In other cases, the fusion protein could not be recovered in an active form following 
denaturation, presumably due to an inability of the denatured protein to correctly refold upon 
5 renaturation. 

Other bacterial gene products have been used to stabilize the expression of foreign 
proteins m prokaryotic hosts These include anthranilate synthetase encoded by the trpE gene 
product of E. coll. staphylococcal protein A, the maltose-binding protein encoded by the malE 
gene in £. coli and glutathione- S-transferase of Schistosoma Japonicum. 

10 

Expression Of Recombinant Proteins In Eukaryotic Hosts 

Recombinant proteins are expressed in eukaryotic hosts rather than prokaryotic hosts 
when the recombinant protein requires post-translational modifications such as glycosylation, 
phosphorylation, disulfide bond formation, oligomerization or specific proteolytic cleavage to 

15 produce a biologically active protein. These post-transcriptional processes are not performed 
by prokaryotic cells. Additionally, some eukaryotic proteins will not fold correctly or 
efficiently when expressed in a prokaryotic host. Many expression systems have been 
developed to produce proteins in eukaryotic hosts [For a review see, Sambrook, J et al. 
Molecular Cloning: A Laboratoryr Manual, Cold Spring Harbor Laboratory Press, NY (1989) 

20 pp. 16.3-16.29J. However, it should be noted that the costs of protein production are often 
higher when eukaryotic cell lines are employed as the host cell. 

To express a DNA sequence encoding a fusion protein in a eukaryotic cell line, a copy 
of the sequences encoding the fusion protein is inserted into a suitable expression vector and 
transfected into the desired host cell. When the fusion protein contains a signal sequence at 

25 the amino-terminus, the fusion protein may be secreted into the culture media. The 

generation of such secreted fusion proteins allows for either continuous or batch harvest of 
fusion protein from eukaryotic cells grown on free flow hollow fiber cartridges. 

PuriFication Of Recombinant Fusion Proteins 

30 Affinity purification protocols were developed to facilitate the isolation of large 

amounts of fusion proteins. Typically, a ligand capable of binding with high specificity to an 
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the purification of P-gal fusion proteins [Germino, J., et al, Proc. Natl Acad. Sci, USA 
80:6848 (1983)], Other expression systems which permit the affinity purification of fusion 
proteins include fusion proteins made with glutathione-S-transferase, which are selectively 
recovered on glutathione-agarose [Smith, D.B. and Johnson, Gene 67:31 (1988)]. IgG- 
5 Sepharose can be used to affinity purify fusion proteins containing staphylococcal protein A 
[Uhlen, M. et ai. Gene 23:369 (1983)]. The maltose-binding protein domain from the malE 
gene of £. coli has been used as a fusion partner and allows the affinity purification of the 
fusion protein on amylose resins. Fusion proteins having the hydrophilic octapeptide Asp* 
Tyr-Lys- Asp-Asp-Asp- Asp-Lys (SEQ ID N0:1) at the amino-terminus can be affinity purified 

10 on an immuno-affinity resin containing an antibody specific for the octapeptide [Hopp, T.P., 
et al. Biotechnology 6:1204 (1988); Prickett, K.S.. et ai. BioTechniques 7:580 (1989); U.S. 
Patent No. 4,851,341, the disclosure of which is herein incorporated by reference]. 

Other means of purifying fusion proteins include the poly-arginine system, in which 
the fusion protein is selectively purified on a cation exchange resin [Sassenfeld, H.M. and 

15 Brewer, SJ. BioTechnology 2:76 (1984); U.S. Patent No. 4,532,207, the disclosure of which 
is herein incorporated by reference]. Sassenfeld and Brewer reported a carboxy-terminal 
extension of five arginine residues fiised to a protein of interest (urogastrone). This basic 
polyarginine extension allowed the purification of the fusion protein on a SP-Sephadex resin. 
An analogous protein expression and purification system employs a polyhistidine tract or tag 

20 at either the amino- or carboxy-terminus of the fusion protein. The fusion protein is purified 
by chromatography on a Ni'* metal affinity resin fPorath, J., Protein Expression and 
Purification 3:7995 (1992)]. The use of small polypeptides as fusion partner {e.g., the 
polyarginine or polyhistidine tag) may be insufficient to stabilize a wide variety of foreign 
proteins in prokaryotes since a fusion protein construct with only ten amino acids from P-gal 

25 was insufficient to stabilize somatostatin [K. Itakura et a/.. Science 198:1056 (1977)]. 

Another means of achieving partial purification of foreign proteins in prokar>'Otes is 
the addition of signal sequences to the foreign protein such that the protein is exported to the 
periplasmic space m £. coU [Grey, G.L. et ai. Gene 39:247 (1985); Baty, U. et ai. Gene 
16:79 (1981); Inouye, H. et ai. J. Bacterial. 149:434 (1982); Kato, C, et ai. Gene 54:197 

30 (1987)]. As the periplasm contains fewer proteins than does the cytoplasm, a partial 
purification is achieved by export alone. 
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Cleavage Of Recombinant Fusion Proteins 

The ability to express recombinant proteins as fusion proteins is useful in that it allows 
the stable expression and affinity purification of the foreign proteins in eukaryotic and 
prokaryotic hosts. However, in many cases it is desirable that the foreign protein be 
5 recovered free from its stabilizing fusion partner. In some cases, the addition of the fusion 

protein to the protein of interest destroys the activity of the foreign protein. When the foreign 
protein is to be used for therapeutic purposes, the presence of the bacterial gene product can 
illicit an undesirable immune response against the entire fusion protein in the recipient. 

To address these problems, expression systems were developed where the protein of 

10 interest could be separated from all or a majority of the bacterial protein sequences. Many of 
these systems provide for the generation of a tripartite hybrid protein in which a site for the 
proteolytic or chemical cleavage is inserted between the protein of interest and the fusion 
partner. Sites for cleavage by collagenase [Germino J. and Bastis, D,, Proc. Nad Acad Sci. 
USA 81:4692 (1984)], renin [Haffey, M.L. et a!., DNA 6:565 (1987)], Factor Xa protease 

15 [Nagai, K. and Thogersen, H.C., Nature 309:810 (1984); Smith, D.B. and Johnson, K.S. Gene 
67:31 (1988)], thrombin (Smith, D.B. and Johnson, K.S., supra) and enterokinase [Hopp, T.P. 
et al, supra\ Prickett, K.S., Biotechniques 7:580 (1989); U.S. Patent No. 4,851,341, the 
disclosure of which is herein incorporated by reference] have been inserted between the fusion 
partner and the gene of interest. 

20 The collagenase-based cleavage system inserts the protein of interest at the amino- 

terminal end of the fusion protein followed by 60 amino acids from chicken proB-2 collagen 
followed by the entire P-galactosidase protein (Germino, J. and Bastis, D., supra). The 
tripartite fusion protein is affinity purified on p-aminophenyl-P-D-thiogalactosidyl- 
succinyldiaminohexyl-Sepharose. The protein of interest is cleaved from the rest of the fusion 

25 protein by controlled digestion with collagenase. Collagenase cleaves following the X and Y 
residues in following sequence: -Pro-X-Gly-Pro- Y- (where X and Y are any amino acid) 
(SFQ ID N0:2). 

Several limitations exist with the collagenase/[i-gal fusion system. Collagenase 
digestion does not remove all of the chicken collagen sequence from the carboxy-terminus of 
30 the protein of interest, several amino acids (<10) remain. The presence of extra amino acids 
IS undesirable when the protein of interest is to be used for therapeutic applications. 
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collagenase recognition sequence. Also, the use of P-gal as the fusion partner increases the 
likelihood that the fusion protein will be insoluble [Shen, S.-H., Proc Natl Acad. Sci. USA 
81:4627 (1987) and Marston, F.A.O., Biochem. J. 240:1 (1986)]. 

A fusion protein cieavable by the endopeptidase renin was reported by Haffey, ML ei 
5 ai, DNA 6:565 (1987). Renin cleaves between the leucine residues in the following 
sequence: Pro-Phe-His-Leu-Leu-Val-Tyr (SEQ ID N0:3). A tripartite fusion protein 
consisting of an Epstein-Barr virus membrane antigen protein (EBV-MA), the recognition 
sequence for renin and the coding sequence for p-gal was produced. The fusion protein could 
be cleaved by treatment with renin between the EBV-MA and P-gal proteins. Cleavage with 

10 renin was reported to be efficient and specific. However, the use of a linker encoding a renin 
recognition site results in the production of a cleaved protein of interest which contains either 
three or four linker-encoded amino acid residues (four residues remain on the carboxy- 
terminus of the protein domain comprising the amino-terminal portion of the fusion protein 
and three residues remain on the amino-terminus of the cleaved protein domain comprising 

15 the carboxy-terminal portion of the fusion protein). Thus, with the fusion system reported by 
Haffey et al, supra it is not possible to generate an authentic recombinant protein of interest. 

The recognition sequence for Factor Xa protease {i.e., the activated form of Factor X) 
has been used to cleave the protein of interest from a fusion partner. Factor Xa protease 
cleaves after the Arg in the following sequences: Ile-Glu-Gly-Arg-X; Ile-Asp-Gly-Arg-X; and 

20 Ala-Glu-Gly-Arg-X, where X is any amino acid except proline or arginine, (SEQ ID N0S:4- 
6, respectively) (Nagai, K. and Thogersen, H.C., supra), A fusion protein comprising the 31 
amino-terminal residues of the cll protein, a Factor Xa cleavage site and human P-globin was 
shown to be cleaved by Factor Xa and generate authentic P globin [Nagai, K. and Thogersen, 
H.C., Nature 308: 810-812 (1984)]. 

25 Smith and Johnson, supra, developed a fusion system in which the amino-terminus of 

the fusion protein was comprised of the glutathione-S-transferase (GST) protein followed by 
the Factor Xa protease recognition sequence which m turn was followed by the protein ol 
mterest. The Factor Xa sequence was followed by a polylinker encoding several restriction 
enz>'me sites to allow for the insertion of the gene encoding the protein of interest. 

30 Depending upon the restriction endonuclease site chosen for the insertion of the DNA 

encoding the protein of interest, the cleaved protein may or may not have non-native amino 
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wholly or partly soluble. Thus, the use of GST rather than p-gal as the fusion partner is an 
improvement. 

Guan and Riggs [Gene 67:21(1987)] developed a fusion system utilizing the Factor Xa 
cleavage site in which the ammo-terminus of the fusion protein was derived from the maltose- 
5 binding protein (MBP) of £. coll. The presence of the MBP on the fusion proiem allows for 
affinity purification on amylose resms. The MBP sequences are followed by the Factor Xa 
cleavage site which in turn is followed by the protein of interest at the carboxy-terminus of 
the fusion protein. The number of non-native amino acids added to the protein of interest as 
a result of cleaving the fusion partner from the fusion protein is a function of the primary and 

10 secondary structure of the junction site. Thus, limitation imposed by the design of the 
junction site preclude the universal use of the Factor Xa system of Smith and Johnson to 
generate an authentic recombinant protein. 

Two different versions of the MBP vectors exist. One version contains the signal 
sequence of the malE gene. The presence of this sequence directs the fusion protein to the 

15 periplasm. The other version lacks this signal sequence so that the fusion protein remains in 
the cytoplasm. Vectors which direct the MBP fusion protein to the cytoplasm generally give 
higher yields than do the vectors which allow for export to the periplasm. However, since 
some foreign proteins will not fold properly in the reducing environment of the E. coli 
cytoplasm, transport to the less reducing environment of the periplasm often will allow proper 

20 folding. The use of a vector which produces a fusion protein exported to the periplasm is 
usually preferred for foreign proteins that are secreted or contain disulfide bonds [Riggs, P., 
Curr Protocols Mol. Biol. 16.6,12 (1990)]. 

The use of GST or MBP as the fusion partner is an improvement over the use of P-gal 
which was used in the coUagenase cleavage system. However, cleavage by Factor Xa is 

25 inefficient for many fusion proteins. It is reported that only about 50% of the fusions made 
with Factor Xa cleavage sites and MBP are cleaved by Factor Xa following affinity 
purification [P. Riggs, Curr. Frolocols Mol Biol , supra]. It has been postulated that 
inefficient Factor Xa cleavage is the result of inaccessibility of the cleavage site within the 
fusion protein. 

30 In order to cleave some fusion proteins which contain a Factor Xa cleavage site, 

denaturation of the fusion protein is required. It is likely that denaturation of the fusion 
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Furthermore, exposing the recombinant protein to harsh denaturants may alter the functional 
activity and/or the antigenicity of the purified protein in addition, once denatured, many 
proteins do not renature {i.e., they become irreversibly denatured or unfolded). 

The insertion of a hnker or spacer between the Factor Xa site and the protein of 
5 interest has been reported to facilitate the cleavage of some fusion proteins. However, the 
insertion of the linker resuks in the addhion of extra amino acids (i.e., not naturally 
occurring) at the amino terminus of the protein of interest (Riggs, P., supra at 16,6.13)- 
Another limitation of the Factor Xa-based fusion systems is the fact that Factor Xa has been 
reported to cleave at arginine residues that are not present within in the Factor Xa recognition 

10 sequence [Nagai, K. and Thogerson, H,C., supra; Lauritzen, C. et a!.. Prof. Expr. and Purif 
2:372 (1991)]. Additionally, Factor Xa will not cleave at the recognition site if the site is 
followed by a proline or arginine residue (Riggs, P., supra at 16.6.13). 

Smith and Johnson, supra, also reported the generation of GST fusion proteins which 
contained a cleavage site for thrombin in place of the Factor Xa site. Thrombin cleaves Arg- 

15 X and Lys-X bonds (where X is any amino acid). Preferred cleavage sites for thrombin are 
(1) P4-P3-Pro-Arg-Pr-P2', where P3 and P4 are hydrophobic amino acids and PI' and P2' 
are nonacidic amino acids and (2) P2-Arg-Pr, where P2 or PT are Gly (Chang, J,-Y., Eur J. 
Biochem 151:217 (1985)1. Smith and Johnson utilized the following thrombin cleavage site: 
Leu-Val-Pro-Arg-Gly-Ser (SEQ ID N0:7)- Cleavage by thrombin was noted to be faster and 

20 more efficient than cleavage of analogous fusion proteins containing the Factor Xa site. The 
chief drawback to the use of this vector system in producing recombinant proteins is that 
typically, extra amino acids remain at the amino-terminus of the protein of interest after 
cleavage (as is the case for the GST/thrombin fusions). This occurs because thrombin has a 
requirement for particular amino acid residues surrounding the Arg or Lys residue where 

25 cleavage occurs. 

A fusion system which uses chemical cleavage rather than an enzymatic cleavage has 
been reported [for a review see, Nilsson, B., Meth. Enz. 198:3 (1991)]. In this system, 
staphylococcal protein A (SpA) forms the amino-terminal portion of the fusion protein 
facilitating affinity purification on IgG-Sepharose. The vector used to generate the fusion 

30 protein contains sequentially (amino to carboxy-terminus) the signal sequence of protein A, 

two copies of the IgG binding domains of protein A, follow^ed by the protein of interest. The 
Kirrnl seaiirnrc of protein A facilitates ^he npnearance of the fusion protein in the culturr 
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treatment with hydroxylamine, cyanogen bromide (CNBr) or N-chlorosuccinamide. 
Hydroxylamine cleaves between the sequence Asn-Gly and thus requires that the first amino 
acid of the protein of interest be glycine. CNBr cleaves at methionine residues and therefore 
when the protein of interest contains internal methionine residues a partial digestion must be 
5 performed. N-chlorosuccinamide cleaves on the carboxy-terminal side of tryptophan residues 
and therefore the protein of interest must not contain tryptophan residues. Thus, the use of 
SpA fusion system in conjunction with chemical cleavage of the fusion protein is limited. 
Chemical cleavage requires the absence of specific residues internal to the protein of interest 
or the presence of specific amino acids in the sequence at the junction between the protein of 
10 interest and the linker sequences. 

The art needs a fusion and cleavage system which allows for the efficient of cleavage 
and generation of authentic proteins of interest that do not contain extraneous (i.e., non- 
naturally occurring) amino acids. 

1 5 SUMMARY OF THE INVENTION 

The present invention relates to compositions and methods for producing authentic 
proteins by recombinant means. The invention provides novel fusion proteins and 
recombinant DNA vectors encoding the same, as well as, methods for the production of 
authentic proteins from the novel fusion proteins. In one embodiment the invention provides 

20 fusion proteins comprising three domains joined together in order from amino-terminus to 
carboxy-terminus of a fu*st domain comprising a protein of interest, a second domain 
comprising a hydrophilic spacer, and an affinity domain, each domain comprising amino acid 
residues. It is not required that each of these domain be contiguous with one another. The 
invention contemplates fusion proteins wherein additional domains and/or elements (e.g , a 

25 penultimate enhancer and/or a CPB terminator) are inserted between the three domains 

described above. The invention further contemplates a fusion protein wherein the hydrophilic 
spacer is an arginine residue and the hydrophilic spacer and the affinity domain are separated 
by a domain comprising 1 to 19 amino acid residues wherein these 1 to 20 residues are 
capable of removal by a means for selective amino acid removal. In a preferred embodiment 

30 these 1 to 20 residues are removal by a selective endoprotease cleavage and/or a 

carboxypeptidasc, the latter is preferably selected from the group comprising carhoxy peptidase 



S 
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The fusion proteins of the present invention comprise a domain comprising a 
hydrophilic spacer. In a particularly preferred embodiment, the amino acids of the 
hydrophilic spacer are susceptible to removal by a means for selective amino acid removal. 
In yet another preferred embodiment, the means for selective amino acid removal comprise a 
5 carboxypeptidase, the latter are preferably selected from the group comprising 
carboxypeptidase A, carboxypeptidase B and carboxypeptidase Y. 

In particularly preferred embodiment, the susceptible amino acids of the hydrophilic 
spacer are selected from the group consisting of arginine and lysine. 

In one embodiment, the susceptible amino acids of the hydrophilic spacer have the sequence 
10 selected from the group comprising SEQ ID NOS:16-37. The hydrophilic spacers of the 
novel fusion proteins may comprise an extended hydrophilic spacer. In a preferred 
embodiment, the extended hydrophilic spacer comprises the amino acid sequence of either 
SEQ ID N0S:18 or 19 joined to the carboxy-terminus of an amino acid sequence selected 
from the group comprising SEQ ID NOS: 16-37 such that either SEQ ID N0S:18 or 19 is 
15 located between said SEQ ID NOS: 16-37 and the affinity domain. 

The fusion proteins of the present invention may further comprise a signal peptide 
sequence located at the amino-terminus of the fusion protein and joined to the first domain 
(i.e., the protein of interest). In a preferred embodiment, the signal sequence is sequence of 
SEQ ID N0:61. 

20 In a particularly preferred embodiment, the fusion protein comprises an endoprotease 

recognition sequence joined to the second domain (i.e , the hydrophilic spacer) between the 
second domain and the affinity domain. In yet another preferred embodiment, the fusion 
protein containing an endoprotease recognition sequence comprises a CPB terminator joined to 
the first domain comprising the protein of interest between the first domain and the second 

25 domain comprising the hydrophilic spacer. 

In still another preferred embodiment, the fusion protein containing an endoprotease 
recognition sequence further comprises a penultimate enhancer joined to the second domain 
comprising the hydrophilic spacer and between the second domain and the endoprotease 
recognition sequence. 

30 The invention also provides recombinant DNA vectors having a nucleotide sequence 

encoding a fusion protein comprising three domains joined together m order, from amino- 

f .r.Mi-^Mc ♦ ^iT-v-vi' .- f ■. ^ir7+ ^.'^^^ni^ rompr^c^inr i ^Tn*<:^}r mtere'^f ;t second 



10 



PCT/US97/0I470 

amino acid residues. In a preferred embodiment, the recombmant DNA vector encodes a 
fusion protein wherem the amino acids of the encoded hydrophilic spacer are susceptible to 
removal by a means for selective amino acid removal, the later preferably being a 
carboxypeptidase. In another preferred embodiment, the amino acids comprising the encoded 
hydrophilic spacer are removable using a carboxypeptidase selected from the group 
comprising carboxypeptidase A, carboxypeptidase B and carboxypeptidase Y. In yet another 
preferred embodiment, the recombinant vector encodes a fusion protein wherein the 
susceptible amino acids of the encoded hydrophilic spacer are selected from the group 
consisting of argimne and lysine; particularly preferred encoded hydrophihc spacers comprises 
sequences selected from the group comprising SEQ ID NOS: 16-37. The encoded hydrophilic 
spacer may comprise an extended hydrophilic spacer; in a preferred embodiment the encoded 
extended hydrophilic spacer comprises the amino acid sequence of either SEQ ID NOS:18 or 
19 in combination with any of SEQ ID NOS:16-37 wherein SEQ ID NOS:18 or 19 are linked 
via their amino-terminus to the carboxy-terminus of SEQ ID NOS: 16-37 and joined via their 
1 5 carboxy-terminus to the affinity domain. 

The invention further provides a method of producing authentic recombinant proteins 
of interest, comprising: a) providing: i) a recombinant DNA vector encoding a fusion protein 
comprising three domains joined together in order from amino-terminus to carboxy-terminus 
of a first domain comprising a protein of interest, a second domain comprising a hydrophilic 
20 spacer, a third domain comprising an endoprotease recognition sequence and an affinity 

domain, each domain comprising amino acid residues; ii) host cell suitable for expressing said 
fusion protein encoded by said recombinant DNA vector; iii) an endoprotease capable of 
cleaving said fusion protein within said endoprotease recognition sequence; iv) an affinity 
resin capable of interacting with said affinity domain on said fusion protein; and v) a means 
25 for removing non-authentic amino acids from said first domain comprising said protein of 
interest; b) introducing said vector into said host cell under conditions such that said fusion 
protein is expressed; c) purifying said expressed fusion protein by means of interaction of said 
affmity domain on said fusion protein with an affinity resin; d) cleaving said purified fusion 
protein with said endoprotease to generate a released protein of interest; and e) removing any 
non-authentic ammo acids present at the carboxy-terminus of said released protein of interest 
with said removal means to produce an authentic protein of interest. The invention is not 

limited tr^ the H^^^* nf fM^inn rrn^rin^ wh^-rpi^ t^p ^\'f^rnnHjl- ^ vr^->r.>- .m-^ t}^,^ -nH^^nr.^Trn^^" 
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domain are two separate domains. As discussed below, in some cases the hydrophilic spacer 
may also serve as the endoprotease domam. 

In a preferred embodiment, the method of producing an authentic protein of interest 
employs a removal means which comprises at least one carboxypeptidase and the removal 
5 comprises contacting the released protein of interest with at least one carboxypeptidase under 
conditions such that the non-authentic amino acids are removed to generate the authentic 
protein of interest. 

The methods of the invention are not limited to the use of a particular affinity domain. 
In one embodiment, the affinity domain comprises a portion of the Fc domain of human 

10 IgGl; in this case, the fusion protein is purified using an affinity resin selected from the 
group comprising protein A and protein G. In another embodiment, the affinity domain 
comprises a portion of the protein glutathione-S-transferase; in this case, the fusion protein is 
puritied on a glutathione resm. In yet another embodiment, the affinity domain comprises a 
portion of the maltose binding protein; in this case, the fusion protein is purified on an 

15 amylose resin. In still another embodiment, the affinity domain comprises a portion of the 
staphylococcal protein A; in this case the fusion protein is purified on an IgG resin. In 
another embodiment, the affinity domain comprises a portion of the protein P-galactosidase; 
in this case, the fusion protein is purified on p-aminophenyl-P-D-thiogalactosidyl- 
succinyldiaminohexyl-Sephahrose. 

20 

DESCRIPTION OF THE DRAWINGS 

Figure 1 provides a schematic illustrating the processing of fusion proteins having 
Level 1 linker designs. 

Figure 2 provides a schematic illustrating the processing of fusion proteins having 
25 Level 2 linker designs. 

Figure 3 provides a schematic illustrating the processing of fusion proteins having 
Level 3 linker designs. 

Figure 4 depicts the junction region of the pMal-p2 vector. 

Figure 5 provides a map of the pMA2-TH vector. 
30 Figure 6 provides a map of the pMA2-TH-IgG vector. 

Figure 7 depicts the junction region of the pMA2-TH-lgG vector. 

Figure 8 provides a map of the pM-Col-K vector. 
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Figure 10 depicts the nucleotide (SEQ ID NO:49) and amino acid sequence (SEQ ID 
NO;50) of the hmge and Fc portion of the human IgG 1 molecule. 

Figure 1 1 depicts the nucleotide and amino acid sequence of three oligonucleotides 
used in the construction of vectors having three variations of the hinge region of the IgGl 
molecule. 

Figure 12 provides a map of the pSTlg-1 vector. 

Figure 13 depicts the nucleotide sequence of and the amino acid sequence encoded by 
the pho signal formed by the annealing of four oligonucleotides. 
Figure 14 provides a map of the pTVkIg-1 vector. 
Figure 15 depicts the junction region of the pTVkIg-1 vector. 
Figure 16 provides a map of the pTVMam-Ren vector. 
Figure 17 depicts the junction region of the pTVMam-Ren vector. 
Figure 1 8 depicts the multiple cloning site present in the pTVMam-Ren vector. 
Figure 19 provides a map of the pTVBac-klg vector. 
1 5 Figure 20 depicts the junction region of the pTVBac-kJg vector. 

Figure 21 depicts the multiple cloning site present in the pTVBac-klg vector. 
Figure 22 depicts the nucleotide sequence of and the amino acid sequence encoded by 
the thrombin and renin linker sequences. 

Figure 23 is a chromatograph generated by an HPLC spectrophotometer. 
20 Figure 24A is a chromatograph generated by an HPLC spectrophotometer. 

Figure 24B is a chromatograph generated by an HPLC spectrophotometer. 
Figure 25A is a chromatograph generated by an HPLC spectrophotometer. 
Figure 253 is a chromatograph generated by an HPLC spectrophotometer. 
Figure 26 is a chromatograph generated by an HPLC spectrophotometer. 
25 Figure 27 is a chromatograph generated by an HPLC spectrophotometer. 

Figure 28 is a plot of the log S/S-F versus time (seconds) usmg the N-CBZ-Ala-Pro 
substrate. 

Figure 29 is a plot of the log S/S-P versus time (seconds) of incubation of the control 
peptide substrate in the CPD-Y Acti-Disk matrix. 

Figure 30 is a table showing the relative rates of release (hydrolysis) for carboxy- 
terminal amino acids from various dipeptides. 

Fieure ^1 depicts the nucleotide and amino acid seouenre of hnman prerrnNCiF 
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Figure 33 provides a map of the pTV-TH-NGF vector. 
Figure 34 depicts the junction region of the pTV-TH-NGF vector. 
Figure 35 provides a map of the pTVM-R-BDNF vector 
Figure 36 depicts the junction region of the pTVM-R-BDNF vector. 
5 Figure 37 provides a map of the pUC/TUR vector. 

Figure 38 provides a map of the pSV2-fur vector. 



DEFINITIONS 

To facilitate understanding of the invention, a number of terms are defined below. 
10 The term "in operable combination" as used herein refers to the linkage of nucleic acid 

sequences in such a marmer that a nucleic acid molecule capable of directing the synthesis of 
a desired protein molecule is produced. The term also refers to the linkage of amino acid 
sequences in such a manner that a functional protein is produced. 

The term "recombinant DNA molecule" as used herein refers to a DNA molecule 
15 which is comprised of segments of DNA joined together by means of molecular biological 
techniques. 

The term "recombinant protein" as used herein refers to a protein molecule which is 
expressed from a recombinant DNA molecule. 

The term "expression vector" as used herein refers to nucleic acid sequences containing 

20 a desired coding sequence and appropriate nucleic acid sequences necessary for the expression 
of the operably linked coding sequence in a particular host organism. Nucleic acid sequences 
necessary for expression in prokaryotes include a promoter, a nbosome binding site, 
optionally an operator sequence and possibly other sequences. Eukaryotic cells utilize 
promoters, and often enhancers and polyadenlyation signals. 

25 Because mononucleotides are reacted to make oligonucleotides in a manner such that 

the 5' phosphate of one mononucleotide pentose ring is attached to the 3' oxygen of its 
neighbor in one direction via a phosphodiester linkage, an end of an oligonucleotide is 
referred to as the "5' end" if its 5' phosphate is not linked to the 3' oxygen of a 
mononucleotide pentose ring and as the "3' end" if its 3' oxygen is not linked to a 5' 

30 phosphate of a subsequent mononucleotide pentose rmg. As used herem, a nucleic acid 

sequence, even if internal to a larger oligonucleotide, also may be said to have 5' and 3' ends. 
The term "hydrophilic" when used in reference to amino acids refers to thost^ amin(^ 
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include lysine, arginine, hlstidine, aspanaie (/.c , aspartic acid), glutamate (i.e., glutamic acid), 
glycine, serine, threonine, cysteine, tyrosine, asparagine and glutamine. 

The term "means for selective amino acid removal" refers to means, such as enz>'mes, 
which are capable of removing specific amino acid residues but which do not remove or can 
be prevented from removing other amino acid residues which comprise the authentic protein 
of interest. Carboxypeptidases, such as CPA, CPB and CPD-Y, are particularly preferred 
means for selective removal of the amino acid residues comprising the hydrophilic spacers of 
the present invention. Ammo acid residues which can be removed {i.e., hydrolyzed or 
digested) by a carboxypeptidase are said to be susceptible to removal by that 
carboxypeptidase. Carboxypeptidases comprise a group of enzymes that hydrolyze peptide 
bonds one amino acid at a time, from the carboxy-terminus of a polypeptide. 

The term '•hydrophilic spacer" refers to combinations of 1 to 5 predominantly 
hydrophilic amino acids present within the fusion proteins of the present invention, wherein at 
least one of the amino acid residues is an arginine residue. Preferred hydrophilic spacers 
comprise 3 to 5 hydrophihc amino acids. The term "extended hydrophilic spacer" refers to 
combinations of 6 to 8 predominantly hydrophilic amino acids. Particularly preferred 
hydrophilic spacers and/or extended hydrophilic spacers comprise only arginine and lysine 
residues; arginine and lysine residues are effectively removed by CPB. The hydrophilic 
spacers of the present invention contain at least one arginine residue; the arginine residues 
provide barriers or termination points for CPA digestions (i.e., CPA cannot remove arginine 
residues). Authentic proteins of interest are generated from the fusion protein by selective 
removal of non-authentic amino acids from the carboxy-terminus of the ftision protein (after 
the fusion protein has been cleaved by the desired endoprotease). The arginine residue(s) 
withm the hydrophilic spacer acts as a barrier to excessive digestion {i.e., digestion into the 
protein of interest) of the fusion protein by CPA. When CPA encounters an arginine residue 
It cannot proceed at that point CPB, which can only remove arginine and lysine residues is 
used to digest the remaining arginine andy'or lysine residues of the spacer to generate the 
authentic protein of interest. As discussed further below, doublets of lysine residues, which 
are extremely resistant to carboxypeptidase Y (CPD-Y) digestion, may be employed in the 
hydrophilic spacers. Hydrophilic spacers containing lysine doublets are employed in level 3 
linker processing designs which requires the use of CPD-Y to the generation of authentic 
proteins 
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In addition to providing a means for generating authentic proteins by providing 
residues which are capable of selective removal {e.g., using carboxypeptidases), the 
hydrophilic and basic nature of arginine and lysine residues causes them to be orientated 
within exposed regions of the fusion protein. This increases the likelihood that the 
5 hydrophilic linker {i.e., the hydrophilic spacer and the endoprotease site) will be accessible to 
digestion with endoproteases. 

The term "penultimate enhancer" refers to a single amino acid residue which increases 
the rate or efficiency at w^hich the amino-lerminal residue of the endoprotease recognition 
sequence is removed during the carboxypeptidase reactions of level 2 or 3 linker designs. 

10 Particularly preferred penultimate enhancers comprise hydrophobic aliphatic resides {e.g.. 
leucine, isoleucine, valine) because they are preferred in the penultimate position by both 
CPD-Y and CPA. Hydrophobic aliphatic residues are preferred penultimate enhancers in 
linker designs when the fusion protein is to be expressed in a host cell which produces furin. 
Because the hydrophilic spacers bear a resemblance to the furin recognition site, a 

15 hydrophobic aliphatic residue is positioned after the carboxy-terminal residue in the 

hydrophilic spacer to prevent any aberrant furin cleavage. When host cells are used which do 
not produce furin {e.g., AGl E. coli cells), the penultimate enhancer may comprise any amino 
acid residue which is efficiently removed by CPA which is also a residue, when present in the 
penultimate position, is favored by CPD-Y {i.e., phenylalanine, tryptophan, leucine, 

20 isoleucine, valine, alanine and methionine). 

If the junction between the endoprotease site and the hydrophilic spacer is formed by 
the juxtapositioning of an amino acid residue which is slowly released from the endoprotease 
recognition sequence (the amino-terminal residue of the endoprotease site) with an amino acid 
residue at the carboxy-terminal position of the hydrophilic spacer that is also slowly released 

25 (e.g., arginine and/or lysine residues), the result is an amino acid pair that is processed 

extremely slowly in the carboxypeptidase reaction (CPD-Y and CPA). In order to increase 
the speed and efficiency of transition from CPD-Y to CPA to CPB digestion, a preferred 
amino acid {i.e.. a penultimate enhancer) is added at the junction betw^een the hydrophilic 
spacer and the endoprotease recognition sequence (see Figure 36 for an example). The 

30 residue which functions as the penultimate enhancer will increase the rate at which the amino- 
terminal residue of the endoprotease site is removed by digestion with carboxypeptidase. 

The Term 'TPR terminntor" refers to a single amino acid that prevents the dicestion of 
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hydrophilic spacer with carboxypeptidase B (CPB). CPB removes only arginine and lysine 
residues. Amino acids which are particularly preferred as CPB terminators arc hydrophobic 
aliphatic residues {e.g., leucme, isoleucine, valine) as these residues are removed quickly by 
carboxypeptidase A (CPA) and carboxypeptidase Y (CPD-Y). A hydrophobic aliphatic 
residue at this position will also prevent any cleavage at the authentic molecule linker junction 
site by furin should the design be used in a mammalian host system and the desired molecule 
contain a furin recognition motif directly at its carboxy-terminus. When the protem of 
interest to be expressed in the fusion protein does nor contain a furin recognition site or when 
a non-furin producing host cell is employed, any amino acid that is rapidly released by CPA 
and that is not released by CPB can be used as a CPB terminator (i.e., phenylalanine, 
tryptophane, leucine, isoleucine, valine, alanine and methionine). A CPB terminator is 
employed in the linker design when the protein of interest contains an arginine or lysine at its 
carboxy terminus; the CPB terminator is located on the carboxy-terminal side of the authentic 
arginine or lysine, between the authentic protein of interest and the hydrophilic spacer (see 
15 Figure 34 for an example). 

The term "endoprotease recognition sequence" refers to a defined amino acid sequence 
that allows cleavage of a protein or peptide containing this sequence by an endoprotease. 

The terms "hydrophilic linker" or "linker" refers to a functional unit present on the 
fusion proteins of the invention which comprises a hydrophilic spacer and an endoprotease 
20 recognition site; the linker may also contain a CPB terminator and/or a penultimate enhancer 
element. The hydrophilic spacer joins or links the protein of interest to the affinity domain. 
The term linker is also used to refer to DNA sequences encoding the amino acids comprising 
the hydrophilic spacer, endoprotease recognition site, CPB terminator and penultimate 
enhancer; it is clear from the content in which this term is used whether the linker comprises 
25 amino acid or DNA sequences. The present invention provides for three levels of hydrophilic 
linker {i.e.. linker) designs as discussed in detail below. 

The term "fusion protein" as used herein refers to a polypeptide which comprises 
protein domains from at least two different proteins. 

The term "control fusion protein" refers to a fusion protein which is generated from a 
recombinant DNA molecule encoding two different protein domains that are joined together 
without the presence of an amino acid sequence comprising the recognition site for a 
sitc-sreciflc p-ptt>nt;" 
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The term ^'fusion partner" refers to components of a fusion protein which are fused to 
the amino acids comprising the protein of interest: these components include affinity domains 
such as portions of Ig molecules, MBP, GST, etc. 

The term "carboxy-terminal fusion protein" refers to a fusion protein in which the 
5 protein of interest is located at the amino-terminal portion of the fusion protein; the fusion 
partner components are joined to the carboxy-terminus of the protein of interest. 

The term "amino-terminal fusion protein" refers to a fusion protein in which the 
protein of interest is located at the carboxy-terminal portion of the fusion protein; the fusion 
partner components are joined to the amino-terminus of the protein of interest. 
10 The term "authentic protein" or "authentic recombinant protein" as used herein refers 

to a protein having the same primary acid sequence as that encoded by the native gene 
sequences, i.e., the recombinant protein does contain any non-native amino acids. In contrast, 
a "non-authentic" protein contains at least one amino acid not lound \n the naturally occurrmg 
protem {i.e., not encoded by the native gene sequences). During the processing of the fusion 
1 5 proteins of the present invention, non-authentic proteins contaimng additional amino acids 
(i e., not encoded by the native gene), typically at the carboxy-terminal end of the authentic 
protein sequence, are generated. These additional amino acids are removed using 
carboxypeptidase(s) to generate authentic recombinant proteins. 

The terms "protein of interest" or "desired protein" as used herein refer to the protein 
20 whose expression is desired within the fusion protein. In a fusion protein the protein of 

interest will be joined or fused with another protein or protein domain, the fusion partner, to 
allow for enhanced stability of the protein of interest and/or ease of purification of the fusion 
protein. In the fusion proteins of the invention, the desired protein or protein of interest may 
comprise either the amino- or carboxy-terminal portion of the fusion protein; however, fusion 
25 proteins which contain the protein of interest as the amino-terminal protein of the fusion 
protein are particularly preferred. 

The terms "authentic protein of interest" or "authentic recombinant protein of interest" 
refer to proteins produced by recombinant means which contains only native or naturally- 
occurring amino acids. 

30 The term "fiffinity domain" as used herein refers to a domain present on a fusion 

protein which permits purification of the fusion protein on an affinity resin. For example, the 
F domam of immunoclobulins may be used as the affinity domain on the fusion proteins of 
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the invention; the domain allows purification of the fusion protein on protein A or protein 
G chromatography resins. 

The term "signal peptide sequence" refers to an approximately 16-40 amino acid 
stretch present on the amino-terminus of a protein which directs the nascent protein to the 
periplasm (prokaryotic cells) or permits the secretion of the protein (eukaryolic cells). The 
signal peptide is cleaved from the protein once the protein has been directed to its desired 
location (i.e., periplasm, secretory granule, etc.). The terms "signal peptide," "signal peptide 
sequence" and "leader sequence peptide" are used interchangeably in the art. 

Many proteins, in particular secretory proteins, are synthesized as larger precursors 
which are cleaved at one or a few specific peptide bonds to produce the mature form of the 
protein. The larger precursor forms are referred to as either preproproteins or proproteins. 
The term "preproprotein" refers to a precursor protein which undergoes at least two successive 
proteolytic cleavages to produce the mature protein. For example, preproalbumin contains an 
18 amino acid signal sequence at the amino-terminus which is cleaved to generate 
proalbumin. Proalbumin is then cleaved to generate albumin. 

The term "proprotein" refers to a precursor protein which undergoes proteolytic 
processing to generate the mature form of the protein. When the active protein is an enzyme 
the precursor is referred to as a "proenzyme" or "zymogen." 

The terms "site-specific protease" or "site-specific endoproteasc" are used 
interchangeably and refer to an endoprotease which cleaves at a specific set of amino acid 
sequences. For example, the endoprotease renin cleaves between the two leucine residues in 
the following sequence: Pro-Phe-His-Leu-Leu-Val-Tyr (SEQ ID NO:3). 

The term "endoprotease" or "endopeptidase" as used herein refers to a protease capable 
of hydrolysing interior peptide bonds of a polypeptide, at points other than the terminal bonds 
(i.e.. the peptide bonds of the terminal amino acid). 

The term "exoprotease" or "exopeptidase" as used herein refers to a protease capable 
of hydrolysing peptide bonds at points only at the terminal bonds of a polypeptide. 

The term "carboxypeptidase" as used herein refers to an exoprotease that hydrolyses 
only the peptide bond of a terminal amino acid containing a free carboxyl group. 
Carboxypeptidases are used to remove amino acids from the carboxy-terminus of a peptide 
chain. 
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The term "promoter DNA sequence" as used herein refers to a DNA sequence that 
precedes a gene m a DNA polymer and provides a site for initiation of the transcription into 
mRNA. 

The term "termmator DNA sequence" as used herein refers to a DNA sequence that 
5 follows a gene in a DNA polymer and provides a signal for termination of the transcription 
into mRNA. 

Eucaryotic expression vectors may also contain 'Viral rcplicons" or "viral origins of 
replication." Viral replicons are viral DNA sequences which allow for the extrachromosomal 
replication of a vector in a host cell expressing the appropriate replication factors. Vectors 

10 which contain either the simian virus 40 (SV40) or polyoma virus origin of replication 

replicate to high copy number (up to 10^ copies/cell) in cells that express the appropriate viral 
T antigen. Vectors which contain the replicons from bovine papillomavirus or Epslein-Barr 
virus replicate extrachromosomally at low copy number ( approximately 100 copies/cell). 
The term "stable transfection" or "stably transfected" refers to the introduction and 

15 integration of foreign DNA into the genome of the transfected cell. The term "stable 

transfectant" refers to a cell which has stably integrated foreign DNA into the genomic DNA. 

The term "selectable marker" as used herein refers to the use of a gene which encodes 
an enzymatic activity that confers resistance to an antibiotic or drug upon the cell in which 
the selectable marker is expressed. Selectable markers may be "dominant"; a dominant 

20 selectable marker encodes an enzymatic activity which can be detected in a cell line. 
Examples of dominant selectable markers include the bacterial aminoglycoside 3' 
phosphotransferase gene (also referred to as the neo gene) which confers resistance to the 
drug G418 in mammalian cells. Additional examples of a dominant selectable marker are the 
bacterial hygromycin G phosphotransferase (hyg) gene which confers resistance to the 

25 antibiotic hygromycin and the bacterial xanthine-guanine phosphoribosyl transferase gene (also 
referred to as the gp( gene) which confers the ability to grow in the presence of mycophenolic 
acid. 

Other selectable markers are not dominant in that their use must be in conjunction 
with a cell line that lacks the relevant enzyme activity. Examples of non-dominant selectable 
30 markers include the thymidine kinase (tk) gene which is used in conjunction with (k- cell 
lines, the CAD gene which is used in conjunction with CAD-deficient cells and the 
mammalian hvpoxanthine-euanine nhnsphonhosvl transfera.se (hprt) cene which is used in 
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cell lines is provided in Sambrook, J. et al.^ Molecular Cloning: A Laboratory Manual, pp. 
16.9-16.15. 

The terms "nucleic acid molecule encoding," "DNA sequence encoding," and "DNA 
encoding" refer to the order or sequence of deoxyribonucieotides along a strand of 
deoxyribonucleic acid. The order of these deoxyribonucieotides determines the order of 
amino acids along the polypeptide (protein) cham. The DNA sequence thus codes for the 
amino acid sequence. 

DESCRIPTION OF THE INVENTION 

The invention provides reagents and methods which permit the production of authentic 
recombinant proteins (i.e., an authentic protein produced by recombinant means). The 
methods of the invention include the construction of expression vectors which permit the 
expression of fusion proteins capable of isolation by affinity chromatography (i.e., affinity- 
purifiable fusion proteins) in procaryotic or eucaryotic cells. The affinity-purifiable fusion 
proteins comprise the following domains, from amino- to carboxy-termini: 1) the protein of 
interest, 2) a hydrophilic spacer, 3) an endoprotease recognition site and 4) an affinity- 
purifiable domain (/.e., the affinity domain). The fusion proteins of the present invention may 
contain additional elements such as CPB terminators and/or penultimate enhancers (discussed 
below). It is noted that the hydrophilic spacer and the endoprotease recognition site may 
comprise a single element as discussed in Level 1 linker designs below. 

In order to produce authentic recombinant protein from the fusion proteins of the 
present invention, the fusion proteins are expressed in an appropriate host cell, purified by 
affinity chromatography and then processed to remove the affinity domain, the endoprotease 
site and hydrophilic spacer (and any additional elements present which comprise amino acids 
not present in the authentic protein). The removal of amino acids comprising the 
endoprotease site and hydrophilic spacer is accomplished using carboxypeptidases. 
Carboxypeptidascs are enzymes which remove (i.e., hydrolyze) protein chains beginning at the 
carboxy-terminal end of the chain and liberate amino acids one at a time. In the methods of 
the present invention, various carboxypeptidases are used singly, sequentially, or in 
combination to generate authentic proteins from the fusion proteins of the present invention. 
The processing of the fusion proteins of the invention is described in detail below. 
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residues. The hydrophilic spacers serve several functions. The hydrophilic amino acids 
which comprise the hydrophihc spacer serve to orient this portion of the fusion protein toward 
the exterior of the molecule in aqueous solutions; this increases the exposure and accessibility 
of the nearby endoprotease recognition site. The hydrophilic spacers also allow for the 
5 physical separation of the domain comprising the protein of interest from the affinity domain. 
This separation ensures that the affinity domain is free to interact with the affinity resin as the 
possibility of stcric hinderance from the protein of interest is reduced. In addition, the 
hydrophilic spacers allow for the physical separation of the endoprotease recognition site from 
the carboxy-terminal portion of the protein of interest. This separation is advantageous as the 

10 carboxy-terminal portion of the protein of interest may limit access of the endoprotease to the 
endoprotease recognition site if located in close proximity. 

The fusion proteins comprising the hydrophilic spacer and endoprotease recognition 
site are purified using an affinity resin which binds to the affinity domain of the fusion 
protein. The affinity domain is generally removed from the purified fusion protein by 

15 digestion with the endoprotease whose recognition site is present in the hydrophilic 

spacer/endoprotease recognition site domain of the fusion protein (the affinity domain may 
also be removed from the fusion protein by chemical cleavage using methods known to the 
art). The domain comprismg the cleaved protein of interest (i.e., that portion of the fusion 
protein containing the protein of interest following digestion of the fusion protein with an 

20 endoprotease) is then processed to remove any amino acids which comprise the hydrophilic 
spacer and/or the endoprotease recognition site. Digestion with the endoprotease may occur 
while the fusion protein is still bound to the affinity resin or alternatively, the fusion protein 
may be eluted from the affinity resin and then digested with the endoprotease. When the 
fusion protein is eluted from the affinity chromatography column prior to digestion with the 

25 endoprotease, the cleaved affinity domain may be removed from the cleaved protein of 

interest by selective bindmg to the affinity resin. The efficiency of the endoproteolytic or 
chemical cleavage of the recombinant fusion protein is determined by the amino acid 
sequence located at the junction between the fusion partner and the protein of interest. 

The cleaved protein of interest may contain amino acids at the carboxy-terminus which 

30 comprise all or a portion of the hydrophilic spacer and/or endoprotease recognition site. 

These amino acids are sequentially removed from the carboxy-terminus of the protein of 
interest hv dipestinn with carhoxvpeptidascs to cenerate an authentic protein of interest 
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Carboxypeptidases comprise a group of enzymes that hydrolyze peptide bonds one amino acid 
at a time, from the carboxy-terminus of a polypeptide. 

In the present invention, synthetic ohgonucleotidcs (termed "linkers") are used to join 
sequences coding for the protein of interest to sequences encoding the affmity domain. By 
varying the triplet DNA sequence representing specific amino acids {i.e., codons) in the hnker 
design, it is possible to create restriction sites for enzymes that recognize and cleave those 
designed sequences without changing the amino acid sequence of the encoded protein of 
interest. The use of sequences encoding recognition sites for restriction enzymes havmg a 
minimum of 6 bases in the recognition site is preferred; this reduces the chance that multiple 
restriction enzyme cleavage sites will be present in both the vector and the sequences 
encoding the protein of interest. 

The ends which result from the digestion of DNA by restriction endonucleases may be 
joined if the overhanging ends are compatible (i.e.. capable of hybridizing). The ends 
produced by restriction digests that leave blunt DNA ends are compatible with all other blunt 
ended DNA. Ends may be compatible as a result of digestion with isocaudamers or they may 
be made compatible by partially or completely filling in the ends using the Klenow enzyme or 
T4 DNA polymerase. Ligation of a pair of filled in ends generally does not recreate either 
restriction site but this technique greatly increases the possible combinations of sequences that 
can be combined. Overhanging termini produced by digestion of DNA with restriction 
endonucleases may be removed to generate blunt ends by treatment of the DNA with SI 
nuclease. 

An example of this technique would be the joining of the DNA coding sequences for 
proteins 1 and 2 (Genes 1 and 2) such that the resulting fusion is orientated 5M^2 3Mn its 
open reading frame using synthetic DNA. A restriction site close to the 3' end of the Gene 1 
sequence is determined by analysis of the nucleic acid sequence. Preferentially, the enzyme 
of choice will produce an overhang to facilitate cloning. Similar analysis is performed for the 
5' sequence of Gene 2. Once the restriction sites have been determined, synthetic 
oligonucleotides are designed to be complementary to each other and code for the sequence 
that is removed as the result of the resU-iction digest. Hybridized oligonucleotides (comprising 
the linker) will have compatible overhangs for ligation to the 3' of Gene 1 and the 5^ uf Gene 
2. The linkers are phosphorylated, hybridized and ligated to Gene 1. Restriction digests arc 
used to cleave off multiple olitronucleotides and eencrate comratihlc (^\>-rh:uM- f - ■ " 
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1 molecules ligated to linker. Gene 1 with ligated linker is then ligated directly to Gene 2. 
The resulting fusion can be identified and isolated using restriction digestion and isolation of 
the desired product on a low melting temperature agarose gel. 

Synthetic DNA can also be used to change or add additional sequences between the 
5 two sequences encodmg the two protein domains by ligating an oligonucleotide comprising 
the desired sequence between two protein domain-encoding sequences. Site directed 
mutagenesis can be used to create restriction sites at or near the termini of the sequences to be 
joined. 

PCR is also an effective tool for cloning known genes (into blunt or sticky sites). 

10 Primers can code for 25-40 bases of known sequence and the resulting PCR product can be 
cloned into a digested vector having blunt ends by removing any possible 3' overhangs with 
T4 DNA polymerase. Another method of linking sequences with the use of the PCR reaction 
is to create restriction sites at the end(s) of the amplified DNA. These restriction sites are 
easily added to the 5' ends of the primers used for amplification. Digestion of the purified 

15 PCR products will produce ends for ligation to other DNA having compatible termini. 

In a preferred embodiment, the invention comprises a vector for the production of 
recombinant proteins in procaryotic or eukaryotic hosts comprising: (1) a controllable 
transcriptional promoter which, upon activation (by induction or release of repression), directs 
the transcription of large amounts of mRNA from the cloned gene; (2) translational control 

20 sequences, such as a ribosome binding site; (3) a prokaryotic or eukaryotic signal sequence 
which directs the transport of the protein across the inner membrane into the periplasmic 
space in bacterial host cells; in a eukaryotic host cell, the signal sequence directs the secretion 
of the protein; (4) a DNA sequence encoding a protein of interest; (5) a linker sequence 
which encodes a hydrophilic amino acid sequence {e.g., a hydrophilic linker which encodes a 

25 hydrophilic spacer) attached to the 3' end of the sequences encoding the protein of interest; 
(6) sequences encoding an endoprotease recognition {i.e.. cleavage) site; and (7) a DNA 
sequence encoding an affinity domain (e^. , at least a portion of the hinge and Fc domains of 
an immunoglobulin molecule) attached to the 3' end of the sequences encoding the linker 
The fusion protein produced by such a vector will comprise the protein of interest at the 

30 amino-terminus of the fusion protein followed by the hydrophilic spacer and endoprotease 

site; the immunoglobulin hinge and Fc domains will form the carboxy-terminus of the fusion 
protein The fusion protein mav al^o contain a CPB terminator and/Vir a penultimate erhancc- 
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Transcriptional Control Of Recombinant Fusion Protein Expression 

The mere insertion of a gene sequence into a vector is generally insufficient to permit 
the expression of an exogenous gene in a host cell line. The structural gene sequences must 
be operably linked to appropriate transcriptional control signals to permit expression of the 
encoded protein in either a prokaryotic and eukaryotic host. In prokaryotes, a number of 
promoter sequences have been identified. Of particular use are those promoters which can be 
controlled (an "inducible" promoter); transcription from an inducible promoter occurs at low 
levels unless a particular molecule is present or until a repressor of the promoter is removed. 
Examples of these types of inducible promoters include: 

1) The tac promoter is a hybrid of the (rp and lac promoters [Amarm, E. et ai, 
Gene 40:183 (1985); de Boer, H.A. et ai. Proa Nad Acad ScL USA 80:21 (1983)]. The tac 
promoter is regulated by the lac repressor. Transcription from the tac promoter is repressed 
in E. coll strains, such as RB791 (ATCC No. 53622), which make high levels of the lac 
repressor. The lac repressor may be provided by placing a copy of the lacf^ gene on the 

15 plasmid carrying the gene of interest; this allows for host-independent repression of the tac 
promoter. Transcription from the (ac promoter is induced {i.e., repression is relieved) by the 
addition of isopropylthio-(i-D-galactoside (IPTG); 

2) The bacteriophage X promoter which is regulated by a temperature-sensitive 
repressor, clts857 [Sambrook, J. et aL, supra, p. 17. 11]. Repression occurs at low 
temperatures (30°C) and is relieved by a shift to higher temperatures (40-45'*C). The ability 
to use heat to induce expression from a promoter is advantageous in terms of cost; no 
compounds must be added to the culture. However, the shift to a higher temperature may 
also activate heat shock proteins, some of which encode proteases. This potential drawback 
may be eliminated by selecting a host strain which is deficient in the expression of these 

25 proteases. For example, £. colt strains Y1089r- (Stratagene) and BL21 (Novagen) are 

deficient in expression of the La protease due to mutations in the Ion gene. Expression of the 
La protease is induced by heat shock. £. coli strains carr>^ing mutations in the Ion gene have 
been shown to Hmit proteolysis of intracellular proteins [HuelL G., et a!.. Nucleic Acid Res.. 
13, 1923 (1985)]. Alternative means of induction of Che A. Pj promoter include the use of 

30 mitomycin C or nalidixic acid, neither of which induce heat shock proteins; 

3) The bacteriophage T7 promoter [Studier. F.W. and Moffatt, B.A., J. Mo! Biol. 
189:1 13 (1986) and Tabor. S and Richardson CC Pmr W;;/ r-; / - r'.wv--.- 
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T7 promoter requires a two component system: T7 RNA polymerase which can be provided 
from a copy of the gene inserted into the E. coli chromosome on an infecting bacteriophage D 
vector and a plasmid vector containing the T7 promoter upstream of the gene to be expressed. 
The above described promoters are preferred, as they are knowTi to direct high levels 
5 of transcription in prokaryotic hosts. However, many other prokaryotic promoters are known 
to the art and the invention is not limited by the choice of promoter selected. 

Transcriptional control signals in eukaryotes are comprised of promoter and enhancer 
elements. Promoters and enhancers consist of DNA sequences that interact specifically with 
proteins involved in transcription [Maniatis, T., et al. Science 236:1237 (1987)]. These 

10 elements have been isolated from a variety of sources including genes in yeast, insect and 

mammalian cells and viruses. The selection of a particular promoter and enhancer depends on 
the cell type which is to be used to express the protein of interest. Some eukaryotic 
pioiuuLcrs and enhancers have a broad host range while others are functional in a limited 
subset of cell types [for review see, Voss, S.D., et al. Trends Biochem. Set. 1 1:287 (1986) 

15 and Maniatis, T., et al.^ (1987), supra]. For example, the SV40 early gene enhancer is very 
active in a wide variety of cell types from many mammalian species and has been widely 
used for the expression of proteins in mammalian cells [Dijkcma, R, et al., EMBO J 4:761 
(1985)]. Two other examples of promoter/enhancer elements active in a broad range of 
mammalian cell types are those from the long terminal repeats of the Rous sarcoma virus 

20 (Gorman, CM., et ai, Proc. Nad. Acad, Sci USA 79:6777 (1982)] and from the human 

cytomegalovirus [Boshart, M., et ai, Cell 41:521 (1985)]. The SV40 enhancer/promoter and 
the CMV enhancer/promoter are preferred transcriptional control sequences when the protein 
is to be expressed in mammalian cells. 

Efficient expression of recombinant DNA sequences in eukaryotic cells requires signals 

25 directing the efficient termination and polyadenylation of the resulting transcript. 

Transcription termination signals are generally found downstream of the polyadenylation 
signal and are a few hundred nucleotides in length. The term "poly A site" or "poly A 
sequence" as used herein denotes a DNA sequence which directs both the termmation and 
polyadenylation of the nascent RNA transcript. Efficient polyadenylation of the recombinant 

30 transcript is desirable as transcripts lacking a poly A tail are unstable and are rapidly 

degraded. The poly A signal utilized in an expression vector may be "heterologous" or 
"endogenous " An endogenous poly A signal is one that is found naturallv followintz the 
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isolated from one gene and placed 3' of another gene. A commonly used heterologous poly 
A signal is the SV40 poly A signal. 

WTicn the fusion proteins of the present invention are to be expressed in mammalian or 
insect cells lines, stable transformants containing DNA sequences encoding the fusion protein 
are preferably generated. However, the invention is not limited to the expression of fusion 
proteins m stably transformed cells. The art is aware of several transient transfection systems 
which may be employed for the expression of the fusion proteins of the present invention. 
For example, the use of an expression vector containing the SV40 origin of replication in 
conjunction with a cell line which stably expresses the SV40 T antigen, such as the COS-1 or 
COS-7 cell lines may be used for the expression of fusion proteins in mammalian cells. 
Vectors which contain the SV40 origin of replication will replicate to high copy number in 
host cells which express the SV40 large T antigen, such as the COS-1 (ATCC CRL 1650) 
[Gluzman, Y.(i9Sl) Ceil 23:175 ] and COS-? (ATCC CRL 1651) [Gluzman, supra] cell 
lines. Vectors containing the polyoma virus origin of replication will replicate to high copy 
15 number in cells, such as WOP cells, which express polyoma virus large T antigen [Dailey, L. 
and Basilico, C. J., ViroL^ 54:739 (1985)]. Another example of a replication transient 
transfection system is the bovine papilloma virus (BPV) system. 
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Use Of Signal Peptides To Translocate Expressed Proteins 

Sequences encoding signal peptides may be joined to sequences encoding the fusion 
proteins of the present invention. The use of a signal sequence may be advantageous for 
expression of recombinant proteins in either prokaryotic or eukaryotic hosts. Secretion signals 
are relatively short (16-40 amino acids) in most species. The presence of a signal sequence 
on the protein permits the transport of the protein into the periplasm (prokaryotic hosts) or the 
25 secretion of the protein (eukaryotic hosts). Signal sequences from bacterial or eukaryotic 

genes are highly conserved in terms of function, although not in terms of sequence, and many 
of these sequences have been showTi to be interchangeable [Grey, G.L. et al.^ Gene 39:247. 
(1985)]. 

In prokaryotes, the signal sequence directs the nascent protein across the inner 
30 membrane into the periplasmic space. It has been found that transport to the periplasm will 

allow proper folding of some proteins which cannot fold properly in the cytoplasm. Transport 
to the periplasmic space also functions as a partial rwrification <^Tep ns 'he n^-ripl'^^^-n . 
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a mild osmotic shock of the bacterial cells. E. coli cells which express the kil gene product 
may be used to achieve the secretion of proteins transported to the periplasm without the need 
for cell lysis or osmotic shock [Kobayashj, T, e( ai, J. Bacteriol 166:728 (1986)]. The kil 
gene product causes an increase in the permeability of the outer membrane allowing the 
5 secretion of periplasmic proteins into the culture medium. 

The presence of a signal sequence on a protein expressed in a eukaryotic host resuUs 
in the transport of the nascent protein across the lumen of the rough endoplasmic reticulum 
which may allow for eventual secretion of the protein into the culture medium. In both 
prokaryotes and eukaryotes, the signal sequence is removed from the amino-terminus of the 
10 protein molecule by enzymatic cleavage during transport of the polypeptide through the 
membrane. 

While some signal sequences have been shown to be interchangeable, the use of 
specific signal sequences in a pariicuiar host may increase expression of the fusion protein. 
For example, when the fusion protein to be expressed comprises a human pre-protein and the 

15 host cell is a bacterial cell, the naturally occurring human secretion signal is replaced with an 
efficient bacterial signal sequence. Among the preferred bacterial signal sequences are those 
derived from the P-lactamase and phosphatase iphd) genes that have been genetically 
engineered or synthesized to have an Nco\ or Nde\ site at the ATG start codon and another 
restriction site at the 3' end of the signal sequence to be used to link the DNA encoding the 

20 mature protein of interest. A phoA mediated expression system which utilizes the pho signal 
sequence followed by a multiple cloning site has been reported [Oka T., et al, Proc. Natl. 
Acad. Sci. USA 82: 7212 (1985)]. 

Immunoglobulin Hinge / Fc domains And Other Fusion Partners 

25 1 he expression of exogenous gene products in host cell lines is facilitated by the use 

of fusion proteins comprising sequences encoding the protein of interest linked via a 
hydrophilic spacer sequence to a fusion partner (as discussed below the spacer comprises 
hydrophilic amino acid residues). The fusion partner functions to stabilize the protein of 
interest as well as to provide a domain which permits the affinity chromatographic 

30 purification of the recombinant protein. The present invention is not limited by the nature of 
the affinity domain chosen. 

A prefencd affinity domain (i.e., fusion partner) comprises the immunoglobulin hinec 
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IgGl is given in SEQ ID NOS: 49 and 50, respectively). The use of these protein domains is 
advantageous. The hinge region of the immunoglobulin molecules is knovm to be tlexible 
and accessible to proteases. Sites for cleavage by papain and pepsin are present in the hinge 
region. The use of the flexible hinge region to join the protein of interest with the ligand for 
the affinity matrix, the Fc region, may allow the independent folding of the two domains. 
Protein A- and Protein G-Sepharose (Pharmacia Biotech) bind to the Fc domain of 
immunoglobulin G of many species with high affinity allowing for the purification of the 
fusion protein. Other classes of immunoglobulins, such as IgM, IgA and IgE, may be used as 
the donor of the Fc region and purified on anti-IgM, A or E resins. 

It is known that IgG Fc can be expressed in £. coli and that proper disulfide bond 
formation occurs when the protein is directed across the inner membrane into the periplasmic 
space [Kitai, K., et al, Appl Microbiol Biotechnol 28:52 (1988)]. The hinge and Fc 
domains of IgG were used to create a CD4/IgG fusion protein for therapeutic use in humans 
[Capon, D.J. et ai, Nature 337:525 (1989) and Mayforth, R.D. and Quintans, J., N. Eng. J. 
15 Med 323:173 (1990)]. The CD4/IgG fusion protein was produced in a human embryonic 

kidney-derived cell line. The CD4AIgG fusion protein was not designed to be cleaved into the 
separate protein components since the investigators fused the IgG sequences to a soluble form 
of CD4 to increase the half-life of soluble CD4 in the serum of patients. 

While the use of the hinge and Fc regions of immunoglobulin molecules is 
20 advantageous for the reasons discussed above, the invention is not limited by the use of these 
immunoglobulin regions as a means to affinity purify the fusion protein. The invention 
contemplates the improvement of other protein fusion systems which use other means of 
providing an affinity -purifiable domain on the fusion protein. For example, sequences 
encoding the novel hydrophilic spacers of the invention may be inserted between the 
25 sequences encoding the malE gene product, which provides the MB? domain for affinity 

purification on amylose resins, and the protein of interest. It is desirable that the protein of 
mtcrest be expressed as the amino-terminal portion of the fusion protein; in contrast existing 
iMBP fusion systems express the MBP domain at the amino-terminus of the fusion. As 
discussed below, there are advantages to having the protein of interest emerge from the 
30 ribosome first. 

The invention also contemplates the use of the novel hydrophilic spacers (described in 
detail below) joined to the hinec region of an immunoelnhuljn whwh ihrv inmp-^ r ^ 



wo 97/28272 PCT/US 97/0 1470 

Again, the protein of interest is inserted in front of the spacer sequences such that the protein 
of interest forms the amino-terminal domain and the affinity Hgand-binding domain forms the 
carboxy-terminus of the fusion protein The addition of the hydrophihc spacer and the hinge 
region of an immunoglobuhn would greatly improve the efficiency of cleavage of existing 
5 fusion systems and provide a means to consistently generate authentic recombinant proteins. 

Additionally, the invention contemplates the improvement of existing fusion/cleavage 
systems by the addition of the novel hydrophilic spacers of the invention to a cleavage site for 
a site-specific endoprotease. The hydrophilic spacer is added to the amino-terminal side of 
the endoprotease cleavage site so that an authentic carboxy-terminus of the protein of interest 

10 may be generated. Again, the fusion protein is preferably designed so that the protein of 
interest is located on the amino-terminal side of the hydrophilic spacer. This allows for the 
generation of authentic recombinant proteins following endoproteolytic cleavage and 
ciirboxypepiidase digestion. Existing cleavage/fusion systems which express the 
affinity-purifiable domain at the amino-terminal end of the fusion protein may also be further 

15 modified to express the protein of interest at the amino-terminal domain of the fusion protein. 
However, even if this is not done, the addition of the hydrophilic spacer to the site-specific 
endoprotease cleavage site is still an improvement to existing cleavage/fusion systems since 
increased efficiency of cleavage will result by the addition of the hydrophilic spacer. This 
spacer will increase the physical separation between the protein of interest and the 

20 endoprotease cleavage site and thereby increase the accessibility of the cleavage site by the 
endoprotease. 

The invention also contemplates the use of the novel hydrophilic spacers followed by a 
cleavage site for a site-specific endoprotease followed by a hydrophilic domain other than the 
hinge region of an immunoglobulin followed by an affinity domain. It is not necessary that 

25 only the hinge region of an immunoglobulin molecule be used to provide a 

endoproteolytically susceptible domain which allows for increased accessibility of the 
cleavage site to the endoprotease. 

For example, an endoproteolytically susceptible stretch comprising the sequence 
Gln-Gly-Pro-Gly-(Gln-Lys), (SEQ ID NO:90), where n equals I to 5 and where n equals 3 to 

30 5 is preferred, may be used to separate the protein of interest Irom an affinity domain other 
than the hinge/Fc region of an immunoglobulin, such as fi-galactosidase [Germino, J. and 
Bastia, D., Cell 32,131-140 (1983)]. the B domain of staphvlococcal protein A , the S-peptide 
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(Smith, supra) and the mature streptavidin gene product [aa 1-160, Argarana,C.E., et ai. 
Nucleic Acids Res. 14, 187M882 (1986)J. This stretch was designed to be both hydrophilic 
(the use of glutamme and lysine residues enhances solubility) and to expose the proteolytic 
site for the chosen endoprotease. This sequence is used to provide the a stretch of amino 
acids which serve a function similar to that provided by the hinge region of the 
immunoglobuhn molecule when the affinity domain chosen is not the hinge/Fc domain of 
IgG, When an endoproteolytically susceptible stretch such as SHQ ID NO;90 is employed, 
the fusion protein will comprise the following domains in order, from the amino- to carboxy- 
terminus: a domain comprising the protein of interest, a domain comprising a hydrophilic 
spacer a domain comprising an endoproteolytically susceptible stretch (ej^., SEQ ID NO:90), 
and the chosen affinity domain (other than a portion of IgG). 

For example, the spacer comprising Gln-Gly-Pro-Gly-(Gln-Lys), may be used in the 
production of recombinant human growth hormone (hGH) in E. coli using the GST protein as 
a carboxy terminal affinity tail (hGH is used as the protein of interest for illustrative purposes; 
any protein of interest may be produced as described herein using an affinity domain other 
than the Fc or hinge domain of IgG). The carboxy-terminal phenylalanine (hGH) is linked to 
a thrombin site using a hydrophilic spacer sequence [e.g., Arg-Arg-Lys-Lys-Lys (SEQ ID 
NO:32)]. The carboxyl side of the thrombin site is linked to the amino-terminus of the GST 
protein with the above described spacer [Gln-Gly-Pro-Gly-Gln-Lys-Gln-Lys-Gln-Lys (SEQ ID 
N0:8)]. The resulting fusion protein is very soluble and the thrombin site is extremely 
vulnerable to the endoprotease (thrombin), resulting in very efficient separation of hGH from 
the fusion partner. Authentic hGH is generated by carboxypeptidase digestion of the 
remainmg thrombin recognition sequence and the hydrophilic spacer. 

Vectors containing DNA sequences encoding the following proteins which may be 
employed as affinity domains are commercially available: p-galactosidase (the lacZ gene 
product), the B domain of staphylococcal Protein A, the maltose binding protein (MBP) (the 
malE gene product) and Schistosoma japonicum glutathione-S-transferase. Vectors containing 
the lacZ gene sequences are available from Pharmacia Biotech (pCHl 10 and pMC1871; 
GenBank Accession Nos; U 13845 and L08936, respectively). Fusion proteins containing p- 
galactosidase sequences can be affinity purified on aminophenyl-P-D-thiogalactosidyl- 
succinyldiammohexyl-Scphahrosc. Vectors containing Schisiosoma japonicum glutathionc-S- 
transferaso (GST^i eene ^cqnence^ ^--^ t rl-i^'" *> .n^ pVv,^^- i , p;, u ..r r-/ 
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sequences can be affinity purified on glutathione resins [e.g., glutathione Sepharose 4B 
(Pharmacia Biotech)]. Vectors containing malE gene sequences encoding the MBP are 
available from New England Biolabs (pMAL-c2 and pMAL-p2). Fusion proteins containing 
MBP sequences can be affinity purified on amylose resin (New England Biolabs). A Vector 
5 containing sequences encoding the IgG binding domains of Protein A is available from 

Pharmacia Biotech (pRIT2T; GenBank Accession No. Ill 3864). Fusion proteins containing 
the lg(j bmding domains of Protein A can be affinity purified on IgG resins [e.g., IgG 
Sepharose 6FF (Pharmacia Biotech)]. 

When any of the above listed proteins (including the hinge/Fc domains of human 
10 IgGl) are used as affinity domains, it is not required that the entire protein be used as the 
affinity domain. Portions of these proteins may be used as the affinity domain provided the 
portion selected is sufficient to permit interaction of a fusion protein containing the portion of 
tijc prulcin used as the affmity domain with the desired affinity resin. 

15 Site-Specific Endoproteases 

The fusion proteins of the present invention comprise a protein of interest linked to an 
affmity domain via a hydrophilic spacer and an endoprotease site. Following affinity 
purification of the fusion protein, the affinity domain is removed from the fusion protein by 
endoproteolytic cleavage. Amino acid sequences which remain on the carboxy-terminal end 
20 of the protein of interest (derived from the endoprotease cleavage site and/or the hydrophilic 
spacer) are then removed by treatment with carboxypeptidase(s), as discussed below. 

The following are preferred site-specific endoproteases: 

1) Papain, which cleaves on the carboxy-terminal side of Arg-X, Lys-X, His-X and 
Phe-X (where X is any amino acid) [Carrey, E.A. (1989) Protein Structure: A Practical 

25 Approach, T.E. Creighton ed., IRL Press, Oxford, pp.117]. Papain is preferred for cleavage 
of fusions protein when the protein of interest is linked directly to the hinge region of an 
immunoglobulin molecule and is not susceptible to papain cleavage in its natural folded state. 
The hinge region is naturally accessible to papain and cleavage occurs at the following the 
histidine residue at position 225 of human IgGl (see Figure 10). Papain is a relatively mild 

30 protease, is commercially available in a highly purified form, and is available attached to solid 
supports (Sigma). The advantage of using a protease attached to a solid support is that this 
allows the complete and ea^sy removal t^f the protease following dieestion 
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2) Protease VII and the ompT proiQasQ from E. coIk which cleave between 
Arg-Arg, Lys-Lys and Lys-Arg residues [Sugimura, K. and Higashi. N., J. Bacterial. 
170:3650 (1988) and Grodbcrg, J. and Dunn, J., ./ Bacterial 170:1245 (1988)]. 

3) Clostropain, which cleaves on the carboxy-terminal side of argininc residues, 
5 with the preferred sequence being Arg- Tyr . 

4) Trypsin, which cleaves on the carboxy-tenninal side of arginine and lysine 
residues. 

5) Yeast Protease Kex2 [Julius D.. et.al. Cell 31^ 1075-1089 (1984)], which 
recognizes and cleaves at the carboxy side of paired basic residues of Lys-Arg and Arg-Arg. 

6) Kallikrein, which preferentially cleaves on the carboxy-terminal side of arginine 
within the recognition sequence Phe-Arg-Ser-Val (SEQ ID N0:9). When kallikiein is used as 
the protease for cleavage, the preferred linker sequence is Val-Pro-Phe-Arg-Ser (SEQ ID 

NO: 10). The valine residue present in SEQ ID NO: 10 functions as a penultimate enhancer 
thereby enhancmg the removal of the proline residue by CPD-Y. 
^ 5 "7) Thrombin, which cleaves on the carboxy-terminal side of arginine in the 

following sequence: Leu-Val-Pro-Arg-Gly-X , where X is a non-acidic amino acid (SEQ ID 
N0:11) [Chang, Eur J. Biochem, 151:217 (1985)]. 

8) Xenopus leavis skin Arg-X'-Val-Arg-Gly (SEQ ID NO: 12) endoprotease which 
cleaves between the arginine and glycine residues with the preferred X' being Leu, Phe, He, 

20 VaK Ala or Trp [Kuks, P., et ai, J. Biol Chem. 264:14609 (1989)]. 

9) Factor Xa, which cleaves between the arginine and glycine residues in the 
following sequences: Ile-Glu-Gly-Arg-X (SEQ ID NO:4), Ile-Asp-Gly-Arg-X (SEQ ID N0:5), 
and Ala-Glu-Gly-Arg-X (SEQ ID NO:6), where X is any amino acid except proline or 
arginine. 

25 10) Enterokinase, which cleaves after the lysine residue in the foliowmg sequence: 

Asp-Asp- Asp-Asp-Lys (SEQ ID NO: 13). 

11) Renin, which cleaves between the leucine residues in the following sequence; 
Pro-Phe-His-Leu-Leu-Vai- Tyr (SEQ ID N0:3). 

12) Collagenasc, which cleaves following the X residue in following sequence: Pro- 
X-Gly-Pro-Y where X and Y are any ammo acid (SEQ ID N0:2) [Steinbrink R.D., et ai^J. 
Biol. Chem 260:2771 (1985)]. 
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Hydrophilic Spacer Design 

The placement of the protein of interest within the fusion construct is important for 
efficient generation of authentic recombmant proteins of interest. Placement of the protein of 
interest at the amino-terminus of the fusion protein has certain advantages. The protein of 
5 interest will be the first amino acid sequences produced on the ribosome. The fact that the 
protein of interest emerges from the ribosome before the fusion partner increases the 
likelihood that the protein of interest will fold properly The amino-tcrminal peptide begins 
to fold as soon as it emerges from the ribosome without the interference of the fusion partner 
[Georgiou G. and Bowden G.A., Inclusion Body Formation and the Recovery of Aggregated 

10 Recombinant Proteins, in Recombinant DMA Technology and Applications, pp 333-356 
McGraw Hill, Inc. (1991)]. The invention provides hydrophilic linkers which encode 
hydrophilic spacers that permit the construction of expression vectors encoding fusion protems 
in which sequences encoding the protein of interest are located at the 5' end of the coding 
region. The sequences encoding the protein of interest are linked to sequences encoding the 

15 fusion partner domam through the hydrophilic linker in such a way as to facilitate the 
generation of authentic recombinant proteins of interest. 

The hydrophilic spacers of the present invention serve several purposes, including a 
physical separation between the signal sequence-tagged protein of interest and the affinity 
domain (e.g., immunoglobulin domains). The amino acids of the spacer are designed to be 

20 highly hydrophilic, thus encouraging the appearance of the spacer towards the exterior of the 
desired molecule thereby increasing its exposure and availability for enzymatic cleavage. The 
sequence encoding the recognition sequence for any known site-specific endoprotease can be 
placed following the hydrophilic spacer. The specific endoprotease site chosen dep)ends upon 
the proteolytic susceptibility of the protein of interest. The hydrophilic spacer/endoprotease 

25 site design generates a fusion protein in which it is possible to completely remove the 

unmunoglobulin domain from the protein of interest. The physical separation provided by the 
hydrophilic spacer between the protein of interest and the affinity domain ensures the spatial 
availability of the affmity domain to interact with the affinity matrix as the possibility of 
steric hindrance from the protem of interest is reduced. 

30 Parameters for the design of the hydrophilic spacers are deduced from the substrate 

specificities of the known carboxypeptidases. These enzymes have different preferences for 
particular amino acids when located at the uUimate position (i e the last residue^ of th^.^ 
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influences the rate of hydrolysis. A review of the specificities of the serine carboxypepiidases 
has been published [Breddam K.. Carlsberg Res Commun. 51, 83-128 (1986)) and the 
specificities of the melallocarboxypeptidascs A and B have been reviewed [R.P. Ambler. 
Methods Enzymulo^ 25:262 (1972)]. 

The hydrophiiic spacer joined to a specific cndoprotease site forms a functional unit. 
This unit has a higher than normal probability of cleavage by endoproteases (due to the 
hydrophiiic nature of the spacer sequences) and the amino acids remaining on the desired 
protein (post-cleavage) can be removed to generate authentic proteins. The protein which is 
generated by endoproteolytic cleavage of the fusion protein is referred to as the "released 
protein of interest." This term indicates that the protein of interest has been separated or 
"released" from the affmity domain. 

Three levels of hydrophiiic spacer/endoprotease site (/>., linker) designs arc provided 
in the present invention. The choice of a particular linker design depends on 1) the nature of 
the carboxy-terminus of the protein of interest and 2) the specific endoprotease chosen for 
15 cleavage of the fusion molecule. The term "level'* refers to the level of processing required to 
generate the authentic protein of interest following cleavage of the fusion protein. 

In Level 1, the processing of the cleaved or "released" protein of interest to generate 
authentic protein requires either 1) no further treatment or 2) treatment with carboxypeptidase 
B. In Level 2, the cleaved or "released" protein of interest is treated with carboxypeptidase A 
and carboxypeptidase B. In Level 3, the released protein of interest is treated with 
carboxypeptidase A, carboxypeptidase B and carboxypeptidase Y. 

Three levels of hydrophiiic spacer/endoprotease site designs are provided; these three 
levels permit the production of most authentic proteins by recombinant means. It is noted 
that the vast majority of proteins to be produced using the methods of the present invention 
25 will utilize a Level 2 or 3 design due to the increased specificity of the endoprotease used. 

Level 1 Linker Designs 

I he Level 1 linker design is the simplest functional unit in which the endoprotease site 
and the hydrophiiic spacer comprise the same amino acids. The Level 1 design is employed 
30 when endoproteases which cleave basic amino acid residues are employed for the removal of 
the affinity domain. Table 1 provides a list of endoproteases suitable for cleavage of Level 1 
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these residues may be present at that position. For example, furin will cleave at either Arg-X- 
Arg-Arg (SEQ ID NO; 14) or Arg-X-Lys-Arg (SEQ ID NO: 15). 



TABLE 1 
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Arg/Lys-Arg"^ 


Yeast Kex2 


Lys-Arg or Arg-Arg 


Arg/Lys^-Arg or Lys^-Lys 


OmpT, Protease VII 


Arg or Lys 


Arg^-Tyr 


Clostropain 


Arg 


Arg-X- Arg/Ly s- Arg^ 


Furin 


Arg-X-Arg/Lys-Arg 
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The endoproteases shown in Table 1 are Hsted in order of those requiring the least 
number of specific amino acids residues in the cleavage site to those requiring the greatest 
number of specific residues. It is noted that trypsin will cleave at the recognition site for all 
of the endoproteases listed in Table T The yeast Kex2, OmpT and protease VII proteases are 

15 referred to as "dibasic recognition" or "dibasic" proteases; these enzymes require two adjacent 
basic amino acid residues for cleavage. The sites cleaved by furin can also be cleaved by the 
dibasic proteases and trypsin. 

The Level 1 linker design is employed when the protein of interest is not susceptible 
to digestion by one of the endoproteases listed in Table 1 and either 1) the naturally occurring 

20 carboxy-terminal amino acid of the protein of interest is an arginine or a lysine or 2) a spacer 
comprising basic amino acids is used to link the protein of interest and the affinity purifiable 
domain. When the protein of interest naturally terminates in an arginine or lysine residue, a 
Level 1 linker can be employed which places an argimne or lysine residue next to the 
carboxy-terminal residue of the protein of interest; in this way a cleavage site for OmpT 

25 and/or protease VII is created. Cleavage of such a fusion protein with the OmpT protease or 
protease VII will generate an authentic protem of interest without the need to further treat the 
released protein of interest. When the protem of interest is not susceptible to digestion by one 
of the endoproteases listed in Table 1 but docs not contain a carboxy-terminal arginine or 
lysine residue, a Level 1 linker is employed to join the protein of interest to the affinity 
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Figure 1 provides a schematic illustrating Level 1 processing. Figure 1 shows an 
exemplary case where the hydrophilic spacer/endoprotease site employed contains a 
recogmtion site for a dibasic protease and the affmity domain comprises the hinge and Fc 
domains of a IgG. In Figure 1, step 1 shows the fusion protein (as a dimer of two molecules 
as the IgG sequences are capable of dimerization) bound to the affinity resin (e.g., protein A- 
Sepharaose). In Level 1 processing, cleavage of the fusion protein generates a released 
protein of interest which contains either an arginine or a lysine residue at the carboxy- 
terminus (Figure 1, step 2). Authentic protein of interest is generated from the released 
protein of interest by removal of the linker-encoded arginine or lysine residues (re., the 
residues comprising the hydrophilic spacer) by digestion with carboxypeptidase B. 

There are processing advantages to using the enzymes listed in Table 1 above. These 
enzymes recognize the amino acids arginine and/or lysine without the requirement for specific 
amino acids in positions located toward the amino-terminus of the substrate. As discussed 
below\ generation of authentic amino acid products is achieved by incubating the cleaved 
15 fusion protein with immobilized carboxypeptidase thus removing the amino acids 

comprising the hydrophilic spacer. Dibasic recognition proteases {i.e., yeast Kex2, OmpT and 
protease VII) are preferred over trypsin due their increased specificity. The OmpT protease is 
a dibasic recognition protease which is readily isolated from the outer membrane of any E. 
coli K strain which expresses the protease, such as LE 392 (Stratagene), by incubating whole 
20 cells with 30 mM n-octylglucoside [Grodberg J. and Dunn J. J., 1 Bacterial. 170:1245 
(1988)]. 

Another advantage of using proteolytic enzymes specific for Arg-Arg or Lys-Arg [le,. 
a dibasic recognition protease) is that many proteins are synthesized as precursor molecules 
{e.g., prohormones) that require proteolytic processing to produce the active or mature form 

25 of the protein. Specialized secretory cells are required to process these proteins during 

secretion [Thomas G. et al , Science 232:1641 (1986)]. Prokaryotic and some eukaryotic cells 
are not capable of processing secretory proteins. The processmg of the prohormone to the 
hormone form of peptide hormones involves the cleavage after a pair of basic amino acid 
residues (i.e.. a dibasic Kex2 site). These dibasic sites comprise Arg-Arg or Lys-Arg. Thus, 

30 when the protein of interest is a peptide hormone, the expression vector will contain 

sequences encoding the prohormone form of the protein of interest, a Level 1 spacer and the 
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digestion with the Kex2 dibasic recognition protease, the mature form of the hormone is also 
generated by cleavage of the dibasic site internal to the prohormone. 



Level 2 Linker Designs 

5 Level 2 spacer/endcprotease site (i.e., linker ) designs are used in combination with 

endoproteases that leave a portion of their recognition sequence behind after proteol>lic 
cleavage. I his remnant, because of its amino acid sequence, can be removed by sequential 
treatment with carboxypeptidase A (CPA) and carboxypeptidase B (CPB). CPD removes 
carboxy-terminal arginine or lysine residues only. CPA can rapidly digest or remove 

10 carboxy-terminal tyrosine, phenylalanine, tryptophan, leucine, isoleucine, methionine, 

threonine, glutaminc, histidine, alanine and valine residues. CPA removes carboxy-terminal 
asparagine, serine and lysine slowly; glycine, aspartic acid, glutamic acid and cysteine 
derivatives (eg., CyS03H and S-carboxymethylcycteine) are removed very slowly by CPA; 
CPA cannot cleave or remove arginine and proline residues. 

15 Thus, using CPA and CPB in combination all amino acids can be removed from the 

released protein of interest except for proline, which neither CPB or CPA can remove. 
Combination of amino acids which are released very slowly or not at all released amino acids 
(proline or arginine) in the penultimate positions will generally decrease the rate of release of 
carboxy-terminal amino acids [Ambler, supra]. The addition of the leucine residue into the 

20 enterokinase linker allows CPA to proceed smoothly to the arginine residue by avoiding the 
extremely slow step of Arg-Asp (the CPA digestion is conducted at 3T^C). 

Table 2 below provides examples of Level 2 linker designs for use with specific 
endoproteases. In Table 2 underlining is used to indicate amino acid residues provided by the 
hydrophilic spacer (the hydrophilic spacer may contain additional hydrophilic amino acid 

25 residues). In Table 2, bold type is used to indicate the penultimate enhancer. Penultimate 

enhancers are an element used to promote the efficient removal of the amino-terminal residue 
of the proteolytic recognition sequence during carboxypeptidase reactions of level 2 or 3 
designs. Specific endoprotcasc recognition sites arc provided and the arrow indicates the 
location of the cleavage within these sites. 
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TABLE 2 



Spacer/Endoprotease Site 



Endoprotease 



Digestion Protocol* 



Ara-Lvs-Lvs -Ile-Glu-Gly-Arg^ 



Factor Xa 



CPB, CPA, CPB 



ArgiLvs-Arg-Phe-Val-Arg^-Gly 



.V. leavis protease^ 



CPB, CPA, CPB 



Arg-Lvs-Lvs -Leu-Asp-Asp-Asp- 
Asp-Lys^ 



Enterokinase 



CPB, CPA, CPB 



10 



15 



20 



25 



cleavage site 

sequential digestion using the indicated enzymes in an immobilized form 
^ RXVRG-endoprotease from X leavis 

Figure 2 provides a schematic which represent the generation of authentic protein 
using the Level 2 spacer design. In Figure 2, basic amino acids which can be removed by 
CPB are represented by the circles and amino acids which can be removed by CPA are 
represented by the squares. Level 2 processing is illustrated using a hydrophilic spacer which 
comprises the sequence Arg-Arg-Lys (SEQ ID NO: 16); the spacer is followed by a leucine 
residue which functions as a penultimate enhancer; the penultimate enhancer is followed by 
the recognition site for the endoprotease enterokinase [Asp-Asp-Asp-Asp-Lys (SEQ ID 
NO: 13)]. Step I of Figure 2 shows the released protein of interest generated by digestion of 
the fusion protein with enterokmase (enterokinase cleaves on the carboxy-terminal side of the 
lysine residue present in the enterokinase recognition site); the released protein is then treated 
with CPB to remove the terminal lysine residue. Step 2 of Figure 2 shows the released 
protein of interest following treatment with CPB and indicates that the released protein of 
interest is now to be treated with CPA to remove the asparagine and leucine residues. In all 
Level 2 and 3 designs, the preferred hydrophilic spacer has a lysine residue at its carboxy 
terminal position to allow efficient transition from CPA digestion to CPB digestion. Carboxy 
terminal lysine residues can be removed with CPA and/or CPB. The lysine residues allow 
CPA to proceed completely through the remaining exoprotease recognition sequence or 
penultimate enhancer without any inhibition. An arginine residue in the same position would 
slow the reaction and therefore is not preferred rnniplete rfn -imt y^^ry^n\■'^^ r,.r>.' in r. 
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will be available for CPB digestion. Following treatment with CPA, the released protein of 
interest is treated again with CPB to remove any remaining lysine residues and the arginine 
residues (step 3) to generate the authentic protein of interest (step 4). As discussed in greater 
detail below, removal of the amino acids which comprise the endoprotease site and the 
5 hydrophilic spacer can be achieved using immobilized forms of the carboxypeptidases. The 
use of immobilized enz>'mes is advantageous as this obviates the need to remove the 
carboxypeptidases from the final preparation of the authentic protein and allows the sequential 
digestion of the released protein of interest with the carboxypeptidases. 

Level 2 designs are used when the protein of interest w^ould be susceptible to the 

10 cleavage protocol described above for the Level 1 design. Level 1 linkers comprise 

hydrophilic spacer sequences that do not require additional endoprotease sequences because 
the endoproteases used in the I eve) 1 design recognize and cleave the hydrophilic spacers- 
Level 2 linkers encode protease recognition sites for proteases that leave amino acids on the 
carboxy-terminus of the protein of interest which cannot be removed by digestion with CPB, 

1 5 Level 2 denotes that additional in vitro processing steps are needed to generate authentic 

protein molecules, specifically CPA digestion(s) is required. Due to the specificities of the 
carboxypeptidases and the digestion conditions utilized in conjunction with the Level 2 and 3 
linkers it is not possible to generate authentic proteins that have carboxy terminal lysine 
residues using carboxypeptidases to digest non-authentic amino acid residues from the protein 

20 of interest. All of the currendy characterized carboxypyeptidases can remove lysine residues 
under the conditions described herein. However, the Level 1 linker design that inserts a 
single arginine residue after the naturally-occurring lysine residue to create an 
OmpT/proteases VII cleavage site permits the generation of authentic proteins which terminate 
(carboxy-terminus) with lysine. 

25 Arginine residues at the authentic proteins carboxy terminus of Level 2 or 3 linkers 

can be handled one of two ways. The first method adds a hydrophilic spacer that is 
composed of lysme only [e.g., Lys-Lys-Lys (SEQ ID 18)]. This hydrophilic spacer is placed 
following the natural arginine, and allows the hydrophilic spacer to be removed during the 
CPA digestion without the requirement of a CPB digestion. The second method adds a 

30 hydrophilic spacer that contains arginine residues and requires alternating CPA and CPB 

digestions to generate authentic protein with a carboxyterminal arginine. A leucine residue is 
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is used as described to remove the hydrophilic spacer, stopping at the inserted leucine residue. 
A final CPA digestion is used to remove the leucine residue and generate an authentic protein. 



Level 3 Linker Designs 

5 The Level 3 linker designs take into consideration the fact that many specific 

endoproteases require proline residues in their recognition sequence for optimum activity. 
Since proline cannot be removed using CPA or CPB, another carboxypeptidase with this 
capability must be used. Carboxypeptidase Y (CPD-Y) is chosen due to the well 
characterized preference of this enzyme for hydrophobic amino acids [Breddam and Ottesen, 
10 Carlsherg Res Commun. 52:55 (1987)]. This yeast carboxypeptidase can digest all naturally 
occurring amino acids but it has a preference for hydrophobic amino acids in both the 
ultimate and penultimate positions. A general preference profile for the CPD-Y enzyme at 
pH 6.5 has been described [Breddam. Carlsberg Res Commun. 5L-83 (1986)] and is shown 
below: 

15 

penultimate Phe > Leu > Ala > His > Glu > Gly > > Lys 

ultimate Met, He, Leu > Phe > Ala > Arg > Ser > Pro > Lys > Asn > Gly >Asp 

The above preferences for the CPD-Y enzyme are listed in order of decreasing K^^/K^ 
20 values. In cases where the values deviate by less than 20%, a comma is used in place of the 
greater than symbol (>). 

CPD-Y can digest every amino acid, although the different amino acids are removed 
with varying rates. In order to selectively remove a proline residue from the carboxy- 
terminus of a population of molecules comprising the released protein of interest without 
25 proceeding into the protein of interest itself, the hydrophilic linker must also provide 

protection against excessive carboxy-tcrminal degradation by CPD-Y. There are sequences 
that are reported to be resistant to CPD-Y digestion (at pH 4.5), namely Arg-Arg and Lys-X 
[Klaiskov et aL, Analytical Biochem 180: 28-37 (1989)]. These sequences are accordingly 
incorporated into the hydrophilic spacer region when designing the Level 3 linkers which 
30 encode the hydrophilic spacers. 

Table 3 provides examples of Level 3 linker designs for use with specific 
endoproteases In Tabh- 'inderlinni^ i'^ u^cH u^ ir-^imtc :^Tvn^^ -w^ \ r..., h . ; i,. v 
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residues). In Table 3, bold type is used to indicate the penultimate enhancer. Penultimate 
enhancers are an element used to promote the efficient removal of the amino-terminal residue 
of the proteolytic recognition sequence during carboxypeptidase reactions of level 2 or 3 
linkers. Specific endoprotease recognition sites are provided and the arrow indicates the 
5 location of the cleavage within these sites. 



TABLE 3 



Spacer/Endoprotease Site 


Endoprotease 


Digestion Protocor 


Arc-Arc-Leu- Val-Pro-Arp'^-Glv 


Thrombin 


CPB, CPD-Y', CPA. 
CPB 


Are-Are-Lvs-Lvs-Lvs-Leu-Val-Pro- 
Arg^-Gly 


Thrombin 


CPB, CPD-Y', CPA, 
CPB 


Are-Lvs-Lvs-Val-Pro-Phe-Arc^-Ser 


Kallikrien 


CPB, CPD.Y\ CPA, 
CPB 


Axe-Lvs-Lvs-Leu-Pro-Leu^-Glv-Pro 


Collagenase 


CPA, CPD-Y', CPA, 
CPB 


Are-Lvs-Lvs-Lvs-Leu-Pro-Phe-His- 

Leu'^-Leu-Val-Tyr 


Renin 


CPA. CPD-Y', CPA, 
CPB 



15 

cleavage site 
" immobilized enzyme limited flow digest 

The penultimate enhancers shown above in the collagenase and renm linkers allow 
20 CPD-Y to remove the proline residues present in the endoprotease recognition sequence after 
cleavage more efficiently than if the endoprotease site sequence were to be directly linked to 
the hydrophilic spacer sequence The lysine residue present in the hydrophilic spacers listed 
above is the residue least preferred by CPD-Y (when the lysine is present in the penultimate 
position shown above). The direct linking of lysine to proline would result in an extremely 
25 slow digestion step during the CPD-Y fiow digestion. In order to significantly raise the 
K^/Ts^ for proline removal, an amino acid which is preferred by CPD-Y when in the 
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preferred amino acid to be inserted after the hydrophilic spacer because it also prevents 
cleavage after the carboxy-terminal hydrophihc spacer residue by precursor processing 
enzv'mes, such as ftirin/PACE and PC1/PC3 [Nakayama, et al. J Biol Chem. 267:16335 
(1992)]. 

Figure 3 provides a schematic which represents the generation of authentic protein 
using the Level 3 linker design. In Figure 3, basic amino acids (/ f., amino acids having 
positively charged side chains) which can be removed by CPB are represented by the circles; 
amino acids which can be removed by CPA are represented by the squares; residues which 
are removed by CPD-Y (e.g., proline) are represented by an arrowhead. The term 
"penultimate enhancer" refers to the use of a non-hydrophilic amino acid (e.g., leucine) which 
when located next to a carboxy-terminal proline residue will enhance the removal of prolme 
by CPD-Y. 

in Figure 3, Level 3 processing is illustrated using a fusion protein which contains a 
hydrophilic spacer comprises the sequence Arg-Lys-Lys (SEQ ID NO: 17); the spacer is 
followed by a leucine residue which serves as a penultimate enhancer allowing the efficient 
removal of the proline residue by CPD-Y. The penultimate enhancer is followed by the 
recognition site for the endoprotease renin [Pro-Phe-His-Leu^-Leu-Val-Tyr (SEQ ID N0:3); 
the arrow indicates the site of cleavage]. 

Step 1 of Figure 3 shows the released protein of interest generated by digestion of the 
fusion protein with renin (renin cleaves on the carboxy-terminal side of the first leucine 
residue present in the renin recognition site); this protein is then treated with CPA to remove 
the leucine, histidine and phenylalanine residues which remain after digestion of the fusion 
protein with renin. 1 his first CPA digestion is allowed to go to completion as the proline 
residue will halt digestion by CPA. The CPA-treated released protein is then treated with 
CPD-Y to remove the terminal proline residue (Step 2 of Figure 3); the use of the leucine 
residue as a penultimate enhancer allows the efficient digestion of proline by CPD-Y 
Following treatment with CPD-Y, the protein of interest is treated with CPA to remove the 
leucine residue The lysine and arginine residues of the hydrophilic spacer are then removed 
by digestion with CPB (Step 4) to generate the authentic protein of interest (Step 5). 

The above discussion provides guidance for the selection of a particular design of 
spacer/endoprotease sites to be used to join the protein of interest with the affinity domain. 
More guidance is provided below 
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The following are preferred forms of hydrophilic spacer sequences: Arg-Arg-Lys 
(SEQ ID NO: 16); Arg-Lys-Lys (SEQ ID NO: 17); Lys-Arg-Lys (SEQ ID NO: 18); Lys-Lys- 
Lys (SEQ ID N0:19); Arg-Arg-Arg-Lys (SEQ ID NO:20); Arg-Arg-Lys-Lys (SEQ ID 
N0:2] ); Lys-Arg-Arg-Lys (SEQ ID NO:22); Arg-Lys-Arg-Lys (SEQ ID NO:23); Arg-Lys- 
5 Lys-Lys (SEQ ID NO:24); Lys-Arg-Lys-Lys (SEQ ID NO:25); Lys-Lys-Arg-Lys (SEQ ID 
NO:26); Arg-Arg-Arg-Arg-Lys (SEQ ID NO:27); Arg-Arg-Arg-Lys-Lys (SEQ ID NO:28); 
Arg-Arg-Lys-Arg-Lys (SEQ ID NO:29); Arg-Lys-Arg-Arg-Lys (SEQ ID NO:30); Lys-Arg- 
Arg-Arg-Lys (SEQ ID N0:31); Arg-Arg-Lys-Lys-Lys (SEQ ID NO:32); Arg-Lys-Arg-Lys- 
Lys (SEQ ID N0:33); Arg-Lys-Lys-Arg-Lys (SEQ ID N0:34); Lys-Arg-Arg-Lys-Lys (SEQ 

10 ID N0:35); Lys-Arg-Lys-Arg-Lys (SEQ ID N0:36); Lys-Arg-Arg-Lys-Lys (SEQ ID N0:37); 
and Arg-Lys-Lys-Lys-Lys (SEQ ID NO:38). These preferred hydrophilic spacers can be used 
in Level 1, 2 or 3 linker designs; these spacers can be used when the fusion protein is to be 
expressed in non-endocrine mammalian cell lines. Fusion proteins comprising proteins of 
interest which end in an arginine or lysine residue require the insertion of a leucine residue 

15 between the carboxy-terminal arginine or lysine of the protein of interest and the hydrophilic 
spacer (as described above for Level 2 designs). 

The above listed sequences represent preferred spacer sequences which should be 
adequate for separating the desired endoprotease site from the carboxy-terminus of the protein 
of interest- The invention also contemplates the insertion of hydrophilic triplets such as Lys- 

20 Lys-Lys (SEQ ID N0:19) and Lys-Arg-Lys (SEQ ID N0:18) to the amino-terminal end of 
any of the above-listed spacers to generate extended hydrophilic spacers. These longer (/ 
extended) spacers are employed when the carboxy-terminus of the protein of interest is 
sufficiently buried within the hydrophobic mterior of the protein so as to structurally inhibit 
the removal of any remaining endoprotease recognition sequences and/or the penultimate 

25 enhancer by CPA digestion. Traditional approaches to dealing with the cleavage of fusion 
proteins having a buried carboxy-termmus of the protein of interest employ the use of 
denaturant during the digestion of the fusion protein. This approach is not appropriate when 
CPA is to be employed as CPA loses most of its activity under denaturing conditions. The 
use of the "extended hydrophilic spacers" is appropriate when the protein of interest is large 

30 and has a hydrophobic carboxy-terminus, The use of the additional hydrophilic triplets will 

extend the iunino acids of the remaining endoprotease recognition sequence and/or penultimate 
enhancer towards the hydrophilic exterior of the protein thereby allowing digestion of these 
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be removed by digestion with CPB under denaturing conditions (e.g., in the presence of 2-6 
M urea) [Sassenfeld,H. M. and Brewer S, J. Bio/Technology, January, p. 76 (1984)]. 
Proteases that cleave to leave a lysine residue are the preferred method for removal of the 
affinity domain until carboxypcptidases which cannot remove lysine residues become 
available. 



Host Ceils And The Use Of Level 1, 2 Or 3 Designs 

The production of recombinant proteins often involves the use of protease inhibitors to 
prevent the degradation of the recombinant protein (e.g., fusion protein) before it can be 
isolated in a purified form. Numerous protease inhibitors are known to the art and include, 
but are not limited to leupeptin, pepstatin A, antipain, aprotinin, PEFABLOC (Pentapharm 
Ltd., Basel, Switzerland), chymostatin, trypsin inhibitor from soybean, FBS-d-Pl, 
phenylmethylsulfonyl fluoride (PMSF) and (4-amidinophenyl) methane sulfony! fluoride 
(APMSF), Due to the design of the hydrophilic spacers of the present invention, it is required 
that steps are taken to inhibit trypsin and other serine proteases that recognize arginine and/or 
lysine residues to prevent the cleavage of the fusion proteins. In selecting a cell line to be 
used as a host cell for the production of fusion proteins, the cell line is screen for the abihty 
to produce and/or secrete proteases which can cleave the hydrophilic spacers of the invention. 
In addition, medium supplements should also be monitored for the presence of these 
proteases. Cell lines (and culture supernatant from cell lines) and medium supplements can 
be monitored using commercially available synthetic peptide substrates. Four particularly 
useful synthetic substrates are N-benzoyl-Val-Lys-Lys-Arg-4-methoxy-B-napthyamide, N-t- 
Boc-Glu-Lys-Lys-7-amido-4-methycoumann, N-t-Boc-G]y-Arg-Arg-7-amido-4- 
methylcoumarin and N-t-Boc-Gly-Lys-Arg-7-amido-4-methylcoumarin [Mizuno ef al, 
Biochem. Biophys. Res. Commun. 144:807 (1987)); all of these substrates are available from 
Sigma. Cell lines and medium supplements which express the least amount of protease 
activity on these type of substrates (i.e.. substrates containing arginine and/or lysine residues) 
are preferred. 

Protease activity capable of producing detectable cleavage of the above synthetic 
substrates and/or of the hydrophilic linker of the fusion proteins of the invention, which is 
present in cell lines and medium supplements to be used, may be inhibited by the inclusion of 
one or more protease inhibitors in the erowth mcdinm md in aH ^nln^ion^ m^-^i t^.. 
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purification steps which remove proteases {e.g., affinity purification). When protease 
inhibitors are to be added to the growth medium, the following proteases inhibitors derived 
from natural sources are preferred: aprotinin (Sigma) derived from bovine lung [Weidie, et ai 
Gene 73:439 (1988)], trypsin inhibitor from soybean and FBS-d-PI which is present fetal 
5 bovine serum [Shinommura, et al. Cytotechnology 6:1 (1911)]. Inclusion of one or more of 
these preferred protease inhibitors in the growth medium will prevent or minimize the 
cleavage of secreted fusion protems prior to the isolation of culture supernatant containing the 
fusion protein. 

The hydrophilic spacers of the present invention are suspectable to cleavage by the 

10 dibasic proteases present in a variety of host cells. The cell lines used for expression of the 
fusion proteins must lack proteases which can cleave the hydrophilic linker sequences. 
Expression of the fusion proteins m bacterial cells requires the use of bacterial strains which 
lack the OmpT protease [e.g., CI 757, B834 (Novagen), UT4400 and BL21 (Movagen); 
Grodberg and Dunn, J. Bacterioi 170:1245 (1988)]. The AGl strain (Stratagene) is a 

15 derivative of the DHl strain which contains very low levels of OmpT protease. Expression of 
the fusion proteins in yeast may be achieved using 5. cerevisiae kex2 mutant strains {e.g., 
XBH16-15A, RW427 and RW433; Julius D., et al. Cell 37:1075 (1984)]. 

Insect cells which lack protease activity have not been reported. Accordingly, when 
fusion proteins are to be expressed in insect cells [e.g., Sf9, Sf21 and MGl cells (Stratagene)] 

20 the following hydrophilic spacers are used: Arg-Lys-Lys (SEQ ID NO: 17), Arg-Lys-Lys-Lys 
(SEQ ID NO:24) and Arg-Lys-Lys-Lys-Lys (SEQ ID NO;38). If an extended hydrophilic 
spacer is to be employed for the expression of fusion proteins in insect cells, the lysine triplet 
(SEQ ID NO; 19) can be added to the carboxy-terminal end of the above 3 spacers. The 
ability of the Sf9 insect cell line to at least partially process proNGF into authentic, active 

25 NGF by cleavage of the naturally occurring preprocessing site Arg-Ser-Lys-Arg (SEQ ID 
NO:39) (U.S. Patent No. 5,272,063, the disclosure of which is herein incorporated by 
reference) limits the use of hydrophilic spacers to those containing Arg-Lys and Lys-Lys 
amino acid combinations and those lacking Arg-Arg and I.ys-Arg combinations. 

Expression of the fusion proteins of the present invention in mammalian cell lines 

30 requires the use of cell lines which have a limited ability to cleave dibasic residues. The 

enzymes responsible for the dibasic processing of prcpro precursor molecules of the endocrine 
^vstcm have been termed PC? and PCI^Pr'' The^e enz\'mes are par^ of the regulated 
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These enzymes have the ability to cleave dibasic (e,g., Lys-Arg and Arg-Arg) and monobasic 
sites (Arg-X-X-Arg) (SEQ ID NO:40) [Molly, ct al J Bxol Chem.. 267:16396 (1992)] Cell 
lines derived from secretory cells of the endocrine system (e.^ , AtT-20 mouse pituitary) 
should be avoided when the fusion protems are to be expressed in mammalian host cells. 
5 Most mammalian cell lines have enz>^mes which process proproteins into mature 

protems during the export of the protein in a constitutive secretion pathway. 7 he isozymes of 
the constitutive secretion pathway are termed PACE (paired basic amino acid cleaving 
enzyme)/furin and PACE4; these enzymes are ubiquitously expressed and are present at 
varying levels m most mammalian cell lines. The substrate specificity of furin [recognition 
10 site: Arg-X-Arg/Lys-Arg (SEQ ID NOS:14 and 15)] has been studied using a variant synthetic 
peptides based on the N-terminal sequence of human prealbumin (Brennan and Nakayama, 
FEBS Letters 347:80 (1994)], These experiments concluded that furin could not cleave in the 
middle of a tetra-basic sequence which indicates that the protease cannot cleave between two 
basic residues. Another established substrate specificity requirement prevents the cleavage of 
15 a furin recognition motif (Arg-X-Arg/Lys-Arg-Y) where Y is a hydrophobic aliphatic residue 
(i.e., leucine, isoleucine, valine) [Nakayama, et al,, J, Biol, Chem. 267:16335 (1992)]. These 
sequence requirements were used as guidelines in the design of hydrophilic spacers of the 
present invention to prevent the cleavage of fusion proteins during secretion in mammalian 
host cell lines. These sequence requirements were used as guidelines in the design of 
20 hydrophilic spacers to prevent the cleavage of fusion proteins during secretion in mammalian 
host cell lines. 

When fusion proteins containing Lys-Arg or Arg-Arg in the hydrophilic linker region 
are to be expressed in mammalian cells, non-endocrine cell lines, such as monkey kidney 
derived cell lines [CV-1 (ATCC CCL 70), COS-l (ATCC CRL 1650) or COS-7 (ATCC CRL 
25 1651)] or a Chinese hamster ovary cell line [CHO-Kl (ATCC CCL 61)] are employed to 
prevent cleavage of the fusion protein in vivo [i.e.. prior to affinity purification). 

Several cell lines of higher eukaryotes cannot correctly process pro-insulin (/ e.. COS 
and CV-1) [Laub O., J Bwl Chem. 258:6043 (1983)] or preproglucagon [BHK fibroblasts 
(ATCC 6281)] [Drucker D.J., J.BioLChem 261:9637 (1986)]; the inability to process these 
proteins suggests that these cell lines are deficient in the enzymes that are responsible for the 
proteolytic processing which occurs in the specialized secretory cells of the endocrine system. 
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natural dibasic sites preferred cell lines for the production of recombinant fusion proteins 
which contain l.ys-Arg or Arg-Arg in the hydrophilic linker region. 

The presence of a dibasic recognition site alone is not sufficient to allow proteolytic 
cleavage as many hormones and growth factors have internal dibasic sites (/.e , sites located 
5 within the sequences encoding the mature form of the protein) that are not cleaved during 
secretion. A study of sequences encoding prosomatostatin derived from several species 
suggests that the general exposure {i.e., location on the exterior of the molecule) and 
conformation of the dibasic site may influence whether a particular site is susceptible to 
cleavage [Warren, Cell 39:547 (1984)]. The enzymes responsible for dibasic cleavage in the 

10 constitutive secretion pathway (i.e., non-regulated secretion) have been characterized; these 
enzymes are termed furin or PACE. Furin and PACE require an arginine at the P4 site for 
cleavage [Hatsuzawa ei al, J. Biol Chem. 267:16094 (1992)]. The specificities of furin and 
PCI/PC3 enzymes from the endocrine system have been compared [Nakayama, J. Biol. Chem 
267:16335 (1992)] and found to be similar [the recognition sequence for furin is Arg-X- 

15 Lys/Arg-Arg (SEQ IS N0S:14 and 15); the recognition site for PC1/PC3 is Arg-X-X-Arg 
(SEQ ID NO:40). Therefore, furin-like activity is found in both endocrine and constitutive 
secretion systems. 



Expression In Non-Endocrine Mammalian Cell Lines 

20 The NIH3T3, HepG2, COS-7 and CHO cell lines are examples of constitutively 

secreting cell lines that produce furin in varying amounts to process pro-regions at the motif 
Arg-X-Lys/Arg-Arg (Yanagita supra). Proteins that naturally have amino terminal pro 
regions are ideal candidates for the carboxy-terminal fusion designs of the present mvention. 
High level expression levels of pro-proteins can overwhelm the amount of natural furin 

25 activity, but when furin is cotransfected with pro-protein fusion this limitation can be 

overcome (Hatsuzawa K. et ai. supra). The optimum motif for furin cleavage is Arg-X- 
Lys/Arg-Arg and although most proproteins produced by the constitutive secretion pathway 
contain this recognition sequence, proproteins of the endocrine secretory route do not. 
Secreted molecules of the endocrine system can be produced in constitutive secretion cells 

30 according to the methods of the present invention by fusing the carboxy-tcrminus of the 

mature protein to the hydrophilic spacers described and joining that fusion to the Kpn\ site of 
the leCr] fraement (discussed in detail below) The amino-tciminal pro cleavage site would 
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By combining the expression of preproprotein fusion molecules with a host that 
contains a high level of furin activity, the resuhing secreted product will he affinity purified 
from the media with a processed amino-tcrminus. Separation of the desired molecule from 
the affinity domain by a specific protease results in a purified, naturally folded and 
glycosylated protein. The remaining hydrophilic spacer and endoprotease sequence can be 
sequentially removed by digestion vWth carboxypeptidase A, carboxypeptidase Y and/or 
carboxy peptidase B. 



Generation of Authentic Proteins Using Carboxypeptidases 

Recognition sequences of more specific proteases are used when the desired product is 
susceptible to proteolytic degradation by the Arg and/or Lys proteases. Removal of the 
carboxy-terminal hydrophilic linker and remaining proteolytic recognition sequence is 
accomplished sequentially with carboxypeptidases with varying specificities. Enzymes that 
can be used are, but not limited to, carboxypeptidase B, carboxypeptidase A, carboxypeptidase 
Y, carboxypeptidase C, cathepsin A, malt carboxypeptidases I/II and carboxypeptidase P. All 
of these enzymes exhibit preferences for amino acids in the ultimate and penultimate positions 
of the substrate, A review of serine carboxypeptidases is given concemmg PI (penultimate) 
and Pr (uhimate) specificities and preferences [Breddam K. Carlsberg Res. Comm. 51, 83- 
128 (1986)]. 

The present invention uses immobilized carboxypeptidases to sequentially and 
specifically remove amino acids from the carboxy-termini of recombinant fusion proteins 
following cleavage with endoproteases. CPA releases different amino acids at different rates 
(Ambler, supra). The following amino acids are releases rapidly by CPA: tyrosine, 
phenylalanine, tryptophan, leucine, isoieucine, methionine, threonine, glutamine, histidinc, 
alanine, valine and homoserine. The following amino acids are releases slowly by CPA: 
asparagine, serine, lysine (the rate of lysine release may be modified by changing the pH of 
the digestion) and MetSO^. The following amino acids arc released very slowly by CPA; 
glycine, aspurtic acid, glutamic acid, CySO.H and s-carboxymethylcysteine. The following 
amino acids are not released by CPA: proline, hydroxyprohne and arginine. The presence of 
an anuno acid w^hich is either very slowly released or not released in the penultimate position 
will generally decrease the rate of release of the carboxy-terminal residue by CPA. CPB has 
:i much ni(^re narr(nv spenficiu' as cnmpn^ed m CV \ ^PR r-rT^n\'t-: f^ilv art^mir- nnH Iv^inr 
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CPA and CPB have defined limitations as to their removal of carboxy-tcrminal amino 
acids and are used to digest remaining linker sequence to completion, therefore traditional 
immobilization media such as activated CNBr agarose beads can be used. Immobilized CPA 
digestions can be incubated to completion because the hydrophilic spacers protect the protein 
5 of interest by encoding an arginine residue which CPA cannot remove. (All hydrophilic 
spacers contain at least one arginine; the lysine triplet used to generate an extended 
hydrophilic spacer is used in combination with spacers which contain an arginine or 
alternatively may be used as the spacer when the protein of interest terminates with an 
arginine residue). Alternate immobilization media is needed to control the hydrolysis of the 

10 carboxy-terminal amino acids when CPD-Y is used as the exoprotease, because CPD-Y does 
not have the specific substrate limitations of CPA and CPB. CPD-Y attached to traditional 
immobilization media (e.g., agarose) produces a wide variety of digestion products. This 
heterogenous population of digested products is useful when attempting to determine the 
organization of amino acids at the carboxy terminus (i.c , for determination of protein 

15 sequences). Extensive proteolytic digestion is likely to occur as result of the peptide entering 
into diffusion zones were the enzyme concentration is high and the rate of diffusion is slow. 
The desired effect when performing CPD-Y digestions is a uniform, but limited, removal of a 
specific amino acid (proline) from a large homogenous population of molecules. This can 
only be accomplished by limiting the time that a high imiform concentration of the CPD-Y 

20 enzyme is allowed to interact with limiting concentrations (/.e., below the K^) of substrate. 

To achieve uniform processing the immobilization media must have limited diffusion 
zones for the substrate to enter and provide a high enzyme binding capacity. Examples of 
media meeting these requirements are nitrocellulose and nylon sheets with 0.45 micron pores 
(Schleicher & Schuell), "Spectra/mesh Nylon" filters with a percent open area of less than 

25 10% (Spectrum), and Acti-Disk /Acti-Mod cartridges (Arbor Technologies and U.S. Patent 
Nos. 3,862,030 and 4,169,014; the disclosures of these patents is hereby incorporated by 
reference). All of these media have sites available for the preferred enzyme immobilization 
method via reductive amination using glutaraldehyde as a linker/functional group [Hermanson, 
Mallia and Smith, Chapter 2, Immobilized Affinity^ Ligand Techniques^ Academic Press Inc., 

30 San Diego, CA (1992)]. 

Enzymes immobilized to these limited-diffusional matrices provide excellent control 
over the amount of time the substrate is exposed td the en/vme. Immobilization of 
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processing based on the controlled exposure of substrate to enzyme. Digestion rates for each 
carboxypeptidase are controlled by changing salt concentration, pH and flow rates past the 
immobilized enzyme. 

Carboxypeptidase A can release a wide variety of amino acids from the carboxy 
terminus at varying rates, except proline and argmine (Ambler R.P., supra). The strategy of 
altematmg between carboxypeptidase A and B is used when the cleavage sequence does not 
contain any prolines. The enterokinase recognition sequence used in Level 2 designs is an 
example of this strategy. The sequence Arg-Arg-Lys-Leu-Asp-Asp-Asp-Asp-Lys (SEQ ID 
N0:41) remains after cleavage of the fusion protein (see Figure 2), The lysine residue can be 
removed by digestion with CPA or CPB at pH 8.0 at 25°C. The release of the lysine, 
asparagine and leucine residues by CPA is very slow at room temperature, but the reaction 
rate can be increased by raising the temperature to 37°C and lowering the pH to less than 6.2 
(Ambler R.P., supra). The reaction can be allowed to go to completion (stopping at the 
arginine residues) as long as suitable protease inhibitors are present (i.e., 
disopropylflourophosphate). Authentic protein is generated by removing the remaining 
arginine residues with carboxypeptidase B. 

In circumstances where carboxypeptidase A cannot remove the remaining amino acids 
from the protease recognition sequence, alternate digestion protocols are used. Since the 
sequence of amino acids to be removed from the protein of interest is known, the enzymes 
used are chosen based on their specificity. For example, cleavage of the thrombin site results 
in the following amino acids remaining attached to the protein of interest: Arg-Lys-Lys-Lys- 
Leu-Val-Pro-Arg (SEQ ID NO:42). Carboxypeptidase B is used to remove terminal arginine. 
Carboxypeptidase Y is used to remove the proline residue. This reaction is slow, but having a 
valine residue in the penultimate position enhances the binding and cleavage rate. The lysine 
triplet not only provides a hydrophilic spacer, it also provides a barrier to excessive 
carboxypeptidase Y digestion. Lysine is removed very slowly, and is also the least preferred 
amino acid to have in the penultimate position. Thus, the lysine pair is a formidable obstacle 
for CPD-Y digestion. Multiple passes (about 3 or 4) of the cleaved protein through an 
immobilized carboxypeptidase Y medium at a rate suitable to remove the carboxy-terminal 
proline insures that the digestion will go to completion (i.e.. approximately 100% past proline 
and approximately 0% past arginine). Immobilized CPA is used to remove any remaining 
leucine valine and Ivsine residues aiKi i tir;il dieestior with CPFi is used to i-encr-t^- th" 
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Expression And Purification Of The Recombinant Fusion Protein 

Once a suitable recombinant DNA vector encoding the desired fusion protein has been 
constructed, the vector is introduced into the desired host cell. DNA molecules are 
iransfected into prokaryotic hosts using standard protocols. Briefly, the host cells are made 
5 competent by treatment with calcium chloride solutions (competent bacteria cells are 

commercially available and are easily made in the laboratory). This treatment permits the 
uptake of DNA by the bacterial cell. Another means of introducing DNA into bacterial cells 
is electroporation in which an electrical pulse is used to permit the uptake of DNA by 
bacterial cells. 

10 Standard protocols exist for the introduction of DNA molecules into eukaryotic hosts, 

including yeast and higher eucaryotcs. DNA may be efficiently transferred into eukaryotic 
cells by calcium phosphate-DNA co-precipitation, DEAE-dextran-mediated transfection, 
electroporation, microinjection, lipofection, protoplast fusion, retroviral infection, particle 
bombardment (e.g., biolistics) and the like. 

15 Following the introduction of DNA into a host cell, selection pressure may be applied 

to isolate those cells which have taken up the DNA. Prokaryotic vectors will contain an 
antibiotic-resistance gene, such as ampiciUin, kanamycin or tetracycline resistance genes. 
Growth in the presence of the appropriate antibiotic indicates the presence of the vector DNA. 
Selectable markers exist for yeast and higher eukaryotic cells as well. In yeast, the DNA 

20 vector typically contains a gene encoding an essential metabolite which the host cell lacks. 
The ability of the transformed yeast cell to grow in the absence of that metabolite indicates 
the presence of the DNA in the yeast cell. In mammalian cells genes encoding selectable 
markers such as aminoglycoside phosphotransferase, which confers resistance to neomycin, 
hygromycin B phosphotransferase, which confers resistance to hygromycin* thymidine kinase, 

25 dihydrofolate reductase, xanthine-guanine phosphoribosyl transferase, adenosine deaminase, 
CAD, and asparagine synthetase are used to isolate cells which have incorporated vector 
sequences (for review see Sambrook, J. et al, supra at 16.8-16.15). 

Following the isolation of host cells harboring the DNA vector sequences, the protein 
encoded by the vector may be expressed. In prokaryotic hosts the bacteria are grown to a 

30 suitable density (OD^oo 0 4-0. 6) and then transcription from the promoter is induced. The 
manner of induction will vary depending upon which promoter is selected. When the tac 
promoter is utilized, induction is achieved by the addition of 0.1 M IPTG to the medium and 



10 
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Current Protocols Mol. Biol. 19:16.6.1-16.6.14 (1990)]. When the lambda promoter is 
utilized, induction is achieved by a shift in temperature from 30°C to 40-45°C [Shatzman, 
A.R. et al, Cun. Protocols Mol. BioL 11:16.3. 1-16.3.1 1 (1990)]. The induction of other 
prokaryotic promoters is well lenoun in the art. 

Following the induction of protein expression from the vector, the fusion protem is 
harvested from the bacteria. If the fusion protein was secreted into the periplasmic space, 
then bacteria are pelleted by centrifugation and the supernatant is discarded. The fusion 
protein is released from the periplasm by a cold osmotic shock (Riggs, P., supra at 16.6.7). 
The pelleted cells are resuspcnded in 30 mM Tris-Cl/20% sucrose, pH 8.0 (Tris/sucrose). 
Thirty milliliters of Tris/sucrose is added per gram of cells (wet weight) and EDTA is added 
to a final concentration of 1 mM. The cells are incubated at room temperature for 5-10 
minutes without shaking or stirring. The cells are then centrifuged and then resuspended in 
80 ml per gram of ice-cold 5 mM MgSO^ and shaken or stirred for 10 minutes while kept at 
5°C using an ice bath. The cells are then centrifuged and the resulting osmotic shock fluid 
15 (supernatant) is then subjected to affinity purification to isolate the fusion protein. If the 
fusion protein was secreted into the cultiire medium, the bacteria are removed by 
centrifugation and the culture supernatant is retained for affinity purification of the secreted 
fusion protein. 

The choice of the affinity matrix depends upon the fusion partner used to create the 
20 fusion protem. When the Fc domain of an immunoglobulin G molecule is utilized, the 

affinity matrix selected is either protein A- or protein G-Sepharose. Protein A and protein G 
bind with high affinity to the Fc domain of IgG. Well characterized purification protocols are 
available for protein A and protein G and the corresponding Sepharose resins are 
commercially available (Pharmacia). If the fusion partner is GST, the affinity matrix used is 
25 glutathione-agarose. When MBP is utilized as the fusion partner, the fusion protein is affinity 
purified on an amylose resin. 

Follow^ing affinity chromatography the protein of interest is separated from the fusion 
partner by proteolytic cleavage. The site-specific protease used for the cleavage will depend 
upon which cleavage site was used in the vector. A vector containing the protein of interest 
30 and the fusion partner without the cleavage site for the site-specific protease is used to express 
a control fusion protein. The control fusion protein is used to test for the ability of the 
<;itc-specific protease to cleave rit resuines in^emn! to thr nrotcin of interest This control 
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protein need only be produced in a small culture to provide enough protein for a test of 
cleavage by the desired protease. 



Protocol Overview 

5 The invention is illustrated by the following examples in which MBP/Ig, NGF/Ig and 

BDNF/Ig fusion proteins are expressed and used to generate authentic MBP, NGF and BDNF. 
However, the invention is not limited to the production of any specific recombinant protein. 
To generate any desired fusion protein capable of producing an authentic protein of interest, 
the following steps are taken: 
10 1 . Insertion of a DNA sequence encoding the protein of interest into a vector 

containing the DNA sequences encoding the desired hydrophilic spacer sequence and the 
desired fusion partner. The sequences encoding the protein of interest are inserted upstream 
of the linker sequence such that the resulting fusion protein comprises the protein of interest 
at the amino-termmus. 

15 2. Insertion of a DNA sequence encoding the protein of interest into a vector 

lacking DNA sequences encoding the desired linker sequence and containing the DNA 
sequences encodmg the desired fusion partner. This vector is constructed to provided a 
control fusion protein which lacks the cleavage site for the site-sf)ecific protease present in the 
linker sequences of the vector in step 1. The control fusion protein is digested with 

20 site-specific protease designed to cleave within the linker sequences. No cleavage should 
occur unless a site for cleavage appears internal to the protein of interest. In such a case, a 
different protease site is selected by inserting the DNA encoding the protein of interest into a 
vector harboring a different linker. 

3- The vector containing the protein of interest is transferred into the desired 

25 prokaryotic or eukaryotic host. Appropriate selective pressure is applied to isolate hosts 

containing the vector sequences. For example, if the vector is used to transform E. coli and 
the vector contains the ampicillin resistance gene, the transformed bacteria are grown in the 
presence of 20-60 |ig/ml ampicillin to select for those bacteria which have taken up the vector 
sequences. If the vector contains the neomycin resistance gene and is mtroduced mto a 

30 eukaryotic host, such as a mammalian cell line, the cell line is grown in the presence of 
200-400 }ig/ml G418 to select for those cells which have taken up the vector sequences. 

4, I ranscription is induced if a controllable promoter is used. If the promoter 
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promoter used is the lambda P, promoter, induction is achieved by increasing the temperature 
the bacteria are grown at from 30°C to 40 to 45°C. 

5. Following induction, the host cells are grown for an appropriate period of time 
to allow for the expression of the fusion protein. In the case of bacterial hosts, the induced 
bacteria are typically grown for 2-4 hours before the bacteria are harv^ested by centrifugation 
for 10 min at 7000 x g at 4°C. If the fusion protein is transported to the periplasm, the 
supernatant is discarded and the fusion protein is released from the bacteria by a cold osmotic 
shock. The shock fluid is then subjected to affinity resin chromatography to isolate the fusion 
protein. In cases where the protein is secreted into the medium, the supernatant is saved and 
the protein is concentrated by ammonium sulfate precipitation in preparation for affinity resin 
chromatography. 

When eukaryotic hosts such as mammalian cell lines are used, clones which stably 
express the fusion protein are isolated (i.e., stable transformants or stable clones) and 
induction may not be necessary (Lc, the promoter is constitutively transcribed). If the fusion 
protein contains a signal sequence, it will be secreted into the culture medium in which the 
manmialian cell is grown. In this case, the stable clone may be expanded into batch cultures 
from which the fusion protein can be isolated from the spent medium every 2-4 days 
depending on the growth characteristics of the established stably transformed cell line. Batch 
growths are typically maintained for 10-30 days depending on the growth characteristic of the 
stably transformed cell line. 

In a preferred embodiment, a fusion protein composed of the protein of interest and a 
portion of an immunoglobulin molecule is expressed. The fusion protein will form a dimer 
between the immunoglobulin domains creating a product having a molecular weight of greater 
than 50 kD. Such a large protein will be retained inside the lumen of the hollow fiber reactor 
with a molecular weight cutoff greater than 50 kD (Unisyn Technologies, Tustin, CA) 
permitting the batch harvest of concentrated product from the interior of the hollow fiber with 
limited amounts of low molecular weight contaminants. Tlie use of hollow fiber reactors with 
large molecular weight cutoffs are preferred because they allow for the batch harvesting of 
fusion protein. The large pores present in these hollow fiber reactors allow the exchange of 
essential nutrients and waste products between the growth medium and the lumen of the fiber. 
This structure permits an increase in the growth rate of the cells and thereby increases 
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If the fusion protein is not secreted, the eukaryolic cells arc harvested using any 
appropriate method. If the cells grow attached to the culture dish, they are harvested by 
treatment with trypsin or by manually scrapping the cells from the culture vessel. The cells 
are pelleted by centrifugation and washed three time with PBS, pH 7.2. fif the cells are 
5 released from the dish by treatment with trypsin, the trypsin is removed from the cell pellet 
by washing the cells (three times with PBS, pH 7.2) following collection by centrifugation. 
Soybean trypsin inhibitor (Sigma) and/or aprotinin (Sigma) may also be included at a 
concentration of 1-2 p-g/ml.] The pelleted cells are then lysed by mechanical disruption or 
chemical treatment. The cell debris is removed by centrifugation and the supernatant is 
10 subjected to affinity resin chromatography to isolate the fusion protein. 

6. Affinity purification of fusion proteins: the supernatant containing the fusion 
protein (shock fluid, culture medium, supernatant from disrupted cells) is applied to an 
appropriate affinity matrix to isolate the fusion protein. For example, if the fusion partner 
utilized is the IgG Fc domain then a SPA-Sepharose resin (Pharmacia) is used to selectively 

15 bind the fusion protein. The supernatant is applied to the resin, the resin is washed to remove 
proteins which do not bind and then the fusion protein is eluted from the resin using an 
appropriate agent. In the case of the SPA-Sepharose resin, elution is achieved with 0.1 M 
citric acid, pH 2.8 or other low pH buffer such as 0.1 M glycine-HCl, pH 3.0. The purified 
fusion protein is then cleaved with an endoprotease to generate authentic protein of interest. 

20 In a preferred embodiment, the desired protein is released from the immobilization matrix by 
digestion with the specific endoprotease that cleaves between the affinity domain and the 
desired protein. This eliminates the need to reabsorb the affinity domain after proteol>lic 
separation and avoids the harsh low pH elution step. 

7. Endoprotease cleavage of fusion proteins: if the purified fusion protein 

25 contains a Ig hinge region, it is first digested with a site-specific endoprotease which cleaves 
at a sequence located within cither the hydrophilic spacer or within the hinge region of Ig. 
Both the fusion protein containing the site for a site-specific endoprotease and the 
corresponding control fusion protein lacking the cleavage site are digested with the 
endoprotease. This is done to test whether the protein of interest is cleaved by the 

30 site-specific endoprotease at a site internal to the protein of interest. Generally the amino acid 
sequence of the protein of interest is known and a site-specific endoprotease is selected which 
does not have a recognition site internal to the protein of interest. However, different 
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or the site-specific endoprotease may cleave at a non-preferred site located within the protein 
of interest. The fusion protein is cleaved into its parts once a suitable site-specific 
endoprotease is found which cleaves only at the desired site. 

Cleavage with the site-specific endoprotease may leave extra amino acids on the 
carboxy-terminal end of the protein of interest {i.e., for Level 2 and 3 designs). These amino 
acids remain as a result of the amino acids present on the amino-terminal side of the cleavage 
site for the site-specific endoprotease as well as those within the hydrophilic spacer. These 
undesirable (;>., non-authentic) amino acids are removed by digestion with carhoxypeptidases. 
Carboxypeptidases cleave carboxy-terminal amino acids. Carboxypeptidase A cleaves 
carboxy-terminal amino acids other than arginine or proline. Carboxypeptidase B cleaves 
only carboxy-terminal arginine or lysine residues. For example, if the fusion protein is 
cleaved at the following thrombin site: Leu-Val-Pro-Arg-Gly-Thr (SEQ ID NO:43) located 
within the follovsing sequence: Protein of interest - Arg-Arg-F.vs-T ys- 
Lys-Leu-Val-Pro-Arg-Gly-Thr- IgG hince/Fc . then following cleavage with thrombin, the 
15 protein of interest will have the following extra carboxy-terminal amino acids: Protein of 
interest- Arg-Arg-Lys- 

Lys-Lys-Leu-Val- Pro-Arg. Treatment with immobilized carboxypeptidase B will remove the 
first arginine residue. Digestion with carboxypeptidase Y at pH 5.75 will remove the proline 
residue and most of the valine and leucine residues. Digestion with carboxypeptidase A at pH 

20 6.0 will remove the remaining valine and leucine residues; the enzyme will slow down at the 
lysine residues. Digestion with carboxypeptidase B will remove any remaining lysine residues 
and the arginine tail yielding an authentic carboxy-terminus of the protein of interest. 
Altematmg carboxypeptidase digestions can be use to generate an authentic protein of interest 
when the linker utilized contains arginine and/or lysine residues followmg the 

25 carboxy-terminus of the protein of interest. 

When the natural carboxy-terminus of the protein of interest comprises an arginine 
residue, the linker utilized will contain a leucine, valine or isoleucine residue between the 
naturally occurrmg arginine on the protein of interest and the arginine/lysine residues in the 
spacer. These residues (Leu, Val, He) are preferred when expression of the fusion protein is 

30 achieved in a mammalian cell line in order to prevent the possibility of undesirable cleavage 
of the fusion protein by furin after the arginine located at the carboxy-terminus of the protein 
of interest Dunne processing t^f the released protein of interest, carboxvpeptidase H will 
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(referred to as a CPB terminator). Carboxypeptidase A is then used to efficiently remove the 
leucine, vahne or isoleucine residue while leaving the naturally occurring arginine residue 
intact as the carboxy-terminal residue of the protein of interest. 

8. Purification of the authentic proteins of interest: following carboxypeptidase 

5 digestion to remove extra carboxy-terminal amino acids, a final purification step using cation 
exchange and gel filtration chromatography is employed to remove released amino acids and 
separate any undigested fusion protein and partially processed protein of interest from the 
authentic protem of interest. 

9. Confirmation of the carboxy-terminal residues of the protein of interest may be 
10 obtained by analysis of purified and cleaved protein using known automated carboxy-terminal 

amino acid sequence analysis methods [e.g.. Miller, C.G. and Bailey, J.M. (1994) Genetic 
Eng. News, Sept. 15, 1994, p. 16]. The processed fusion proteins may be subjected to 
automated C-terminal protein sequence analysis according to the manufacturer's 
instructions[e.g., Hewlett-Packard G1009A C-terminal protein sequencing system; Miller, e( 
15 ai, Techniques in Protein Chemistry VI (1995) Academic Press, Inc., pp. 219-227]. 

EXPERIMENTAL 

The following examples serve to illustrate certain preferred embodiments and aspects 
of the present invention and are not to be construed as limiting the scope thereof. 

20 In the experimental disclosure which follows, the following abbreviations apply: M 

(molar); mM (millimolar); ^M (micromolar); mol (moles); mmol (millimoles); fimol 
(micromoles); nmol (nanoraoles); gm (grams); mg (milligrams); ^g (micrograms); pg 
(picograms); L (liters); ml (milliliters); fil (microliters); cm (centimeters); mm (millimeters); 
|am (micrometers); nm (nanometers); °C (degrees Centigrade); AMP (adenosine 5'- 

25 monophosphate); cDNA (copy or complimentary DNA); dNTP (deoxyribonucleotide 
triphosphate); PBS (phosphate buffered saline); OD (optical density); HEPES (N-[2- 
Hydroxyethyljpiperazine-N-[2-cthanesuIfonic acid]); HBS (HEPES buffered saline); SDS 
(sodium dodecylsulfate); Iris-HCl (tris[Hydroxymethyl|aminomethane-hydrochloride); Klenow 
(DNA polymerase I large (Klenow) fragment); rpm (revolutions per minute); EGTA (ethylene 

30 glycol-bis(B-aminoethyl ether) N, N, N\ N'-tetraacetic acid); HDTA 

(ethylenediaminetetracetic acid); bla (fl-lactamase or ampiciliin-resistance gene); ORI (plasmid 
oripin of replication); iacl (lac repressor): ATCC (American Type Culture Collection, 



10 



wo 9111^^112 PCT/US97/01470 
Logan, UT); NEB (New England Biolabs, Inc., Beverly, MA); Novagen (Novagen, inc., 
Madison, WI); Operon (Operon Technologies. Alameda, CA); Sigma (Sigma Chemical Co., 
St. Louis, MO); Slratagene (Stratagene Clomng Systems, LaJolla, CA). All restriction 
enzymes were purchased from New England BioLabs and were used according to the 
manufacturer's instructions, unless othemise noted. 

Unless otherwise specified, protein or peptide sequences are written from amino to 
carboxy-termini and nucleic acid sequences are listed in the 5' to 3' direction. 

For the production of recombinant proteins using this invention, it is necessary to use a 
strain of bacteria carrying some mutation that prevents the expression of the omp T locus 
[Grodberg and Dunn, supra]. Strains B834 (Novagen), BL21 (Novagen), and C1757 are 
preferred due to their inability to cleave the proteolytically susceptible dibasic Lys-Arg site at 
172-173 of T7 RNA polymerase. A derivative of the DHl strain, the AGl strain (Stratagene) 
was used for experiments due to its commercial availability and limited ability to cleave the 
hydrophilic spacer/endoprotease regions during the isolation of fusion product. B strains of E 
15 coli are preferred due the Ion deficiency. 

EXAMPLE 1 

Expression Of A MBP/IgG Fusion Protein In £. coli 

20 The following experiments were conducted to demonstrate the advantages provided by 

the use of the hydrophilic spacers and endoprotease sites in conjunction with the hinge and Fc 
domain of IgG in the design of a fusion protein. A fusion protein comprising the secreted 
form of the maltose-binding protein (MBP) as the protein of interest linked to an IgG affinity 
tag via an Arg-Arg-Thrombm linker (a Level 3 design) was produced. The vector 

25 pMA2TH-9 was constructed to produce this MBP/IgG fusion protein. This example involved 
a) the construction of the pMA2TH-9 expression vector, b) expression and affinity 
purification of the MBP/IgG fusion protein, c) cleavage of the MBP/IgG fusion protein with 
thrombin and d) cleavage of the MBP/IgG fusion protein while immobilized on a Protein A 



30 



resin. 



a) Construction Of pMA2TH-9 

nVl M p-^ vf-rtnr wq-; rirsicn-H To nHow cxpressiop (if fusion proteins in 
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England BioLabs). pMAL-p2 encodes the maltose-binding protein (MBP) under the 
transcriptional control of the inducible tac promoter The pMAL-p2 vector encodes the lac 
repressor {lacf^ gene) which suppresses transcription until IPTG is added to the culture 
medium. The pMAL-p2 vector contains sequence encoding the naturally occurring signal 
5 sequence of the MBP (i.e., the malE signal sequence) which allows the MBP to be exported 
into the periplasmic space of the host cell. The vector is design such that the protein of 
interest is inserted dowTistream {i.e., on the carboxy-terminal side) of the sequences encoding 
the MBP in pMAL-p2. The resulting fusion protein is then purified using an amylose resin 
which binds the MBP. An asparagine linker, a Factor Xa protease site and a polylinker are 

10 positioned between the MBP sequences and the inserted protein of interest in the pMAL-p2 
vector; this region of pMAL-p2 is termed the junction region and is shown in Figure 4. 
These sequences were removed in the modified vector. 

The pMAL-p2 vector was used as the starting vector because it encodes a secretory 
protein (the MBP with the malE signal sequence), an example of the type of proteins ideally 

15 suited for production using the expression systems of the present invention. In the modified 
vector, the MBP acts as the protein of interest rather than as the fusion partner and affinity 
domain (as is the case in the pMAL-p2 system). 

pMAL-p2 was modified as follows. The unique NgoMl site, located at position 4778 
on the map of pMAL-p2, was removed by ligating an excess of a self-annealed Al 

20 oligonucleotide termed Al which has the sequence 5'-CCGGGCGCGCGCGC-3' (SEQ ID 
NO:44) into 200 ng of NgoMl digested pMAL-p2. Subclones containing the desired 
modification were identified by restriction analysis. The selected clone contained a B55'Hn 
restriction site in place of the original NgoMl site and was designated pM-Ng(-), The 
asparagine linker and polylinker cloning sites were removed from pM-Ng(-) by digestion with 

25 Sad and HindlU. Following the digestion reaction, the plasmid was purified from the 

released fragment using a CHROMA SPIN-400 gel filtration column (Clontech) using the 
manufacturers protocol . 

A conversion linker next inserted into the pM-Ng(-) vector; this conversion linker 
serves several functions. The linker encodes the hydrophilic spacer (Arg-Arg) and the 

30 endoprotease site (thrombin) and it contains recognition sites for SaH, EcoRl and Nhel which 
permit the insertion of sequences encoding various amounts of the IgG hinge/Fc fragment (the 
affinitv domain). The conversion linker was formed by annealing the complementan' 
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5'-CGTTTCGCCGGCTGGTTCCGCGGGGTCGACGGAT TCAGCTAGCA-3^ (SEQ ID 
NO:45). The B2 oligonucleotide comprises the sequence 

5 ^AGCTTGCTAGCTGA ATCCGTCGACCCCGCGGAACCAGCCGGCGAAACGAGCT -3 ' 
(SEQ ID NO:46). The annealed B1/B2 oligonucleotide pair generates the following sequence 
which contains recognition sites for NgoMl Sail. EcoRi and Nhe\: 

5'-CGTTTCGCCGGCTGGTTCCGCGGGGTCGACGAATrCAGCTAGCA-3' 
3^TCGAGCAAAGCGGCCGACCAAGGCGCCCCAGCTGCTTAAGTCGATCGrrCGA.5' 



The Bl and B2 oligonucleotides were annealed by placing 10 |il of each 
oligonucleotide (100 ^M) in a 100 ^1 total volume in a buffer comprising 20 mM Tris-HCl 
(pH 8.0), 100 mM NaCl, 12 mM MgCI,. The mixture was placed in a 500 ^] 
microcentrifuge tube and heated for 10 minutes at 95^ C, the reaction was slowly cooled to 
60°C for 1 hour and then allowed to slowly cool to room temperature over a three hour 
15 period. The annealed conversion linker was then ligated to the Sacl/HincHU digested pM- 
Ng(-) plasmid as follows. 

The conversion linker was ligated to the Sacl/HindlU digested pM.Ng(-) using 200 ng 
of purified digested plasmid and the hybridized complementary oligonucleotides Bl and B2 in 
a 3:1 insert to vector ratio; the resulting plasmid was termed pMA2-TH. A schematic 
20 representation of the pMA2-TH plasmid is shown in Figure 5. 

In Figure 5, the location of the conversion linker downstream of the MBP coding 
region is indicated by the cluster of restriction sites. The coding regions for the lac repressor 
(/ac/0, MBP and ampicillin resistance gene (P-lactamase) are shown and the direction of 
transcription is indicated by the arrows. The tac promoter is also indicated (open arrowhead). 

The B1/B2 oligonucleotide pair creates compatible overhangs for Sad and HindlU at 
the 5^ and 3^ end, respectively. These restriction sites allow for the directional ligation of the 
conversion linker and also place a phenylalanine residue at the carhoxy-terminus of the MBP 
protein. This phenylalanine provides an easily characterized amino acid to allow for ease in 
monitoring of the subsequent carboxy-terminal digestion reactions (by fluorescence; 
30 phenylalanine emission is detected at OD.j,). 

A 0.7 kb Sall/Nhel fragment encoding the hinge and Fc domains of human IgCi were 
inserted into the pMA2-TH vector to provide an affmitv-purifiable domain on the fusion 
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SaWNhel IgG fragment was inserted into pMA2-TH digested with Sail and Nhe\ to generate 
pMA2-TH-lgG. Figure 6 provides a schematic representation of pMA2-TH-lgG. 

In Figure 6, the coding regions for the Jac repressor (lacf^), MBP, the IgG fragment 
and ampicillin resistance gene (P-lactamase) are showTi and the direction of transcription is 
5 indicated by the arrows; selected restriction sites are indicated. The tac promoter is also 
indicated (open arrowhead). The location of the junction region is indicated. 

The insertion of the IgG fragment into Sail and A^/]^I-digested pMA2-TH was achieved 
as follows. An equimolar ratio of vector and IgG fragment were used in the ligation. The 
ligation products were used to transform competent AGl cells (Stratagene) and 10 clones 
10 containing inserts of the proper size were isolated; these clones were designated pMA2-TH- 
IgG 1-10. 

The pMA2-TH-IgG clones were screened for the stable production of the fusion 
protein by the detection of human IgG. Briefly, the ten pMA2-TH-IgG clones were grown 
overnight in LB and used to inoculate 5 ml of LB containing 100 fig/ml of ampicillin in a 50 

15 ml conical tube. The 5 ml cultures were incubated at 37°C, shakmg at 235 rpm for 90 min. 
IPTG was added to 1 mM final concentration and growth was continued for an additional 90 
min. The induced cultures were pelleted, resuspended in 500 |il PBS, pH 7.4 and sonicated 
using a SONIPREP sonicator at a power setting that allowed maximum membrane disruption 
for 4 pulses of 20 sec. The sonicated cells were clarified by centriftigation at 12,000 x g for 

20 10 minutes at 4°C and the supernatants were collected. Five microliters of the clarified 

extracts were spotted onto dry nitrocellulose strips and allowed to air dry. Positive controls 
containing known concentrations of human IgG were used as standards (Sigma). The strips 
were then incubated for 30 minutes in blocking solution (PBS containing 5% non-fat dry 
milk). The strips were transferred into blocking solution containing 5^g/ml of a horseradish 

25 peroxidase- labeled anti-human IgG, Fc-specific goat antibody [Rockland, Gilbertsville, PA]. 
The strips were incubated with the anti-Ig antibody for 1 hour at room temperature while 
rocking. The strips were washed 3 times in PBS, pH 7.4 and developed using a DAB/H20t 
solution until color appeared on the positive control dots. Relative amounts of IgG in the 
sonicated extracts were determined by comparison of the signal intensity relative to the 

30 positive controls. Detection of IgG molecules is simplified due to the commercial availability 
of Fc-specific pre -conjugated antibodies which subsequently allow for a direct assay for fusion 
protcm expression. Clone pMA2-TH-IgG-^ was chosen for expression and isolation studies 
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The junction region {i.e., the region which joins the protein of interest with the affinity 
domain) present in pMA2-TH-lgG is shown in Figure 7 The first 5 amino acid residues 
shown comprise the carboxy-terminal end of the MBP (the phenylalanine is encoded by the 
conversion linker as described above) The hydrophilic spacer (Arg-Arg) and thrombin 
5 recognition site are boxed and labeled; the cleavage site for thrombin is indicated by the 

arrow placed between the Arg and Gly residues. The phenylalanine residue, the hydrophilic 
spacer and the thrombin site [Leu-Val-Pro-Arg-Gly (SEQ ID NOT I)] are encoded by the 
conversion linker. The conversion linker sequences also encode two arginine residues located 
immediately downstream of the thrombin recognition site; these arginines are followed by 
10 sequences comprising the hinge region of IgG, The pairs of arginine residues surroundmg the 
thrombin site in the linker were designed to allow for maximal exposure of the 
endoproteolytic site, increasing the removal efficiency of the affinity domain from the protein 
of interest. 

The junction region of pMA2-TH-IgG can be easily modified to replace the existing 
1 5 hydrophilic spacer and/or endoprotease recognition site with other spacers and endoprotease 
sites; this is achieved by digestion of pMA2-TH-lgG with the Sail and NgoM and insertion 
of the desired sequences. 

b) Expression And Affinit> Purirication Of The MBP/IgG 
20 Fusion Protein 

pMA2-TH-IgG-9 was used to express the MBP/IgG fusion protein in E. coli. Bacteria 
(£ coli strain AGl ) harboring the plasmid were grown and induced using protocols developed 
for the pMAL-p2 vector [Riggs in Curr. Protocols Moi Bio!.^ at p. 16,6T2 (1990)] with 
minor modifications which permitted maximum expression of the IgG carboxy fusion 

25 constructs. Briefly, all ampicillin resistant colonies were growrn in LB containing 100-150 
^g/ml ampicillin to maintain plasmid stability. All cultures were grown at 37°C, shaking at 
225-250 rpm. Glucose was not included in expression growlh experiments because it has 
been shown to cause leakage of fusion proteins from the cell by makmg the outside 
membrane semi -permeable. The growth conditions and protocol for the most efficient 

30 expression of the pMA2-TH-IgG-9 fusion protein at the 1 liter scale were as follow^s: 100 mis 
of LB containing 120 ^g/ml ampicillin (LB-Amp 120) in a 250 ml fiask was inoculated with 
n ^mele colonv from 3 fresh nlnU' I hf cnlnir^ w:^^^ then ern\i.n mi<Tlor rhn^e (i r fnr 
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Amp 120 in a 2.8 liter culture flask (Fisher Scientific). Cultures were grown at 37°C with 
vigorous shaking (240-260 rpm) until an O.D.^oo of 0.600 was reached. Cultures were then 
induced by the addition of 440 ^1 of 1 M IPTG and incubation was continued lor 2.5-3.0 
hours. 

5 Induced cultures were harvested by ccntrifugation at 4'*C in 500 ml bottles for 30 min 

at 4000 X g in a GSA rotor (Sorvall). The pelleted cells were then disrupted by treatment 
with lysozyme as described below; alternatively, the cells may be disrupted by sonication or 
the fusion protein may be isolated from osmotic shock fluid. 

The pelleted cells were resuspended at 1/50 the original volume in Lysis Buffer (50 

10 mM Tris-Cl, pH 8.0, 100 mM NaCl, 1 mM ZnCl, and 10% sucrose). Freshly prepared 

lysozyme (10 mg/ml in 10 mM Tris-HCl, pH 8.0) was added to a final concentration of 0.5 
mg/ml and the solution was incubated on ice for thirty minutes with inversion every 5 
minutes. The lysoz>'me-treated cells were pelleted at 15,000 x g for 30 minutes at 4°C in an 
SS34 rotor. The supernatant was pooled and stored at -20°C until affinity chromatography 

15 was performed. 

Immobilized Protein A was used to isolate the carboxy-terminal IgG fusion proteins. 
Immobilized Protein A was obtained from two manufacturers, Protein A Sepharose-6B 
(Pharmacia) and Affinica Protein A Agarose (Schleicher and Schuell). Disposable 10 ml, 1.0 
cm diameter Affinica columns (Schleicher and Schuell) were used to hold 1-2 ml of the 

20 Protein A matrix. The protein was applied to the Protein A columns using binding and wash 
buffers comprising Tris-HCl (50 mM), phosphate (100 mM) or carbonate (100 mM) buffers at 
pH 8.0 containing 450 mM NaCl. Elution buffers included 0.1 M glycine-HCl, pH 2.3 and 
0.4 M citrate buffer, pH 2.8. The citrate buffer was preferred because it did not interfere with 
measurement of protein concentration in the eluted fractions using the BCA protein assay 

25 (Pierce). Eluted fusion protein was neutralized with either 1/4 volume of 0.5 M sodium 
phosphate buffer, pH 7.7 or 1/10 volume IM Tris-HCl, pH 9.0. 

After cell extracts were prepared from induced bacterial cells (as described above), the 
resulting supematants were prepared for chromatography by passage through a 0.44 micron 
filtration cartridge which included a prefilter matrix to prevent clogging [Uniflow Plus, 

30 (Schleicher and Schuell)]. The supernatant was the then brought to 450 mM NaCl by adding 
an appropriate volume of 5M NaCl. The sample w^as applied to a 2.0 ml protein A column 
which had been pre-equilibrated with 5 volumes of binding buffer (50 mM Tris pH 8,0, 450 
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gravity. The flow-through was collected and reapplied to the column. The column was 
washed with 10 volumes of binding buffer and the fusion protein was eluted by the addition 
of 5 column volumes of elution buffer (0.04 M Citrate buffer, pH 2.8). Fractions (I ml) were 
collected into microcentrifuge tubes containing 100 |il of neutralizing buffer (1 M Tris-HCL 
pH 9.0) and protein levels were monitored using a micro protein assay kit based on brilhant 
blue G (Coomassie blue) interaction with protein to produce a blue colored complex (Sigma). 
Fractions containing eluted protein were pooled and run on an 4-15% precast SDS-PAGE 
mini gradient gel (Schleicher & Schuell) to determine purity. Samples were boiled for two 
minutes after adding an equal volume of 2X loading buffer (0.5 M Tris-HCl pH 6.8, 4% 
SDS, 20% glycerol, and 0,01% bromphenol blue). Visual inspection of the PAGE gel after 
staining with Coomassie brilliant blue dye showed that the fusion protein was isolated in both 
monomeric and dimeric forms and was greater than 95% pure (gels run under non-denaturing 
conditions were used to estimate the percentage of protein present as a dimer). These results 
demonstrate that both monomeric and dimeric IgG hinge/Fc regions can bind to protein A. 
15 Furthermore, the results show that affinity purification of the fusion protein from total cellular 
extracts is specific for the MBP/IgG fusion protein. 

c) Cleavage Of The MBP/IgG Fusion Protein With Thrombin 

The affinity purified MBP/IgG fusion protein was cleaved with the endoprotease 
20 thrombin as follows. The eluted fusion protein was mixed with an equal volume of 2X 
thrombin cleavage buffer (50 mM Tris-HCl, pH 8.0, 300 mM NaCl, 5 mM CaCl.) and 
thrombin (Sigma) was added at 1:100 molar ratio (thrombin:fusion protein). The digestion 
reaction was incubated at room temperature and 50 ^l aliquots were removed at 5, 15, 30 and 
60 minutes to determine the efficiency of thrombin digestion. The removed samples (50 ^l) 
25 were added to 50 |al of 2X SDS reducing buffer (0.5 M Tris-HCl pH 6.8, 4% SDS, 2% 
mercaptoethanol, 20% glycerol, and 0.01% bromphenol blue) and boiled immediately to 
inactivate the thrombin enzyme. Time course digestion samples were analyzed on a 10% 
SDS-PAGE gel. From visual inspection of the gel, it was estimated that 75% of the fusion 
protein was cleaved after 5 minutes and complete digestion was achieved after 15 minutes of 
30 incubation in the presence of a 1:100 molar ratio of thrombin to fusion protein. 

Following digestion with thrombin, a sample containing 2 mg of the digested 
MBP.agG fusion protein was brought to 4'^0 mM NaCl and was applied to a frc^h pn^tr-n n 
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protein of interest (the MBP portion of the fusion protein). Dot blot analysis using anti-IgG- 
HRPO showed that the second protein A column efficiently removed the majority of the free 
IgG. The re-chromatography of the thrombin-digested sample on protein A removes free IgG 
domains from cleaved fusion protein as well as any undigested MBP/lgG fusion protein 
5 The thrombin-IgG fusion plasmid pMA2-TH-lgG-9 represents an example of a 

hydrophiiic spacer/specific endoprotease site organization that allows efiicient proteolytic 
separation and isolation of the desired production molecule using affinity chromatography. 
Previously researchers have reported problems cleaving a MBP fusion protein (Riggs, P., 
supra at 16.6T3). Steric hindrance from the amino-terminus of the protein of interest and 

10 solubility are the most conmion problems encountered when cleaving fusion proteins using the 
pMAL/Factor Xa expression system. 

Solutions to this problem involve the denaturation of the fusion molecule with 
guanadinium HCl or 8M urea before enzymatic cleavage (Riggs, supra). However, the use of 
harsh denaturants can significantly decrease or eliminate the functional activity of the desired 

15 protein. Alternatively, other proteases have been used that more efficiently cleave fusion 
molecules as the result of the cleavage site being towards the middle of a recognition 
sequence rather than following a recognition sequence (for example, thrombin, renin, Igase). 
However, these proteases do not generate authentic proteins as following endoprotease 
digestion amino acids contributed by the endoprotease recognition site remain on the protein 

20 of interest. 

In contrast, the hydrophiiic spacers of the present invention physically separate the 
natural conformation of a desired molecules carboxy- terminus from the designed proteol>nic 
site and provide enhanced solubility because of their hydrophiiic nature. The hydrophiiic 
spacer permits the removal of any residual proteolytic recognition sequence that remains at the 

25 carboxy-terminus of the authentic protein after the specific cleavage of designed fusion 

protein. The arginine residue(s) present in the hydrophiiic spacer provide a barrier to prevent 
the removal of residues from the carboxy-terminus of the authentic protein of interest by CPA 
(Ambler, supra) and allows for the removal of any amino acids derived from the endoprotease 
recognition site which remain on the carboxy-terminus of the protein of interest following 

30 endoprotease digestion of the fusion protein. 
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d) Cleavage Of The Fusion Protein While Immobilized On The 
Protein A Resin 

Crystallographic studies of protein A bound to Fc fragments of IgG indicate that the 
major binding site on the IgG molecule occurs at the junction between the CH2 and CH3 
regions of the IgG molecule [Deisenhofer, Biochem. 20:2361 (1981)]. These cr>'staIlization 
studies suggested that endoproteolytic site located on the carboxy-terminal IgG fusion proteins 
of the present invention would be available for proteolytic cleavage while the fusion protem 
was immobilized on the protein A matrix. This hypothesis was tested by performing the 
following experiment. 

Bacteria containing the pMA2-TH-IgG-9 plasmid were grown and induced as 
described in section b) above. Supernatant from cell extracts was prepared and applied to a 
protein A column as described above. The column was washed with binding buffer to 
remove any non-specific proteins bound to the column. Five column volumes of thrombin 
cleavage buffer were then applied to the column. The lower salt concentration present in the 
thrombin cleavage buffer did not release any of the bound fusion protein as determined by 
assaying the wash for the presence of IgG. Four column volumes of thrombin buffer 
containing enough thrombin for the cleavage of a maximally bound matrix (10 mg fusion 
protein/ml matrix or 10 ng thrombin/mg fusion protein) were then added to the column. The 
column matrix was gently shaken to create a suspension and the top and bottom of the 
column were sealed. The column was then placed on a rocker for 20 minutes at room 
temperature. 

The column was then placed upright and the cleavage buffer was collected. The 
column was then washed with 2 column volumes of wash buffer (50 mM Tris-HCl, pH 8.0, 
100 mM NaCl, 20 mM EDTA). The flow-through containing the cleavage buffer and the 
wash buffer was pooled (these fractions contain the cleaved protein of interest). The IgG 
portion of the digested fusion protein was eluted as described above and all protein containing 
fractions were analyzed on an SDS-PAGE gel. The results of the SDS-FAGE analysis 
showed that all fusion molecules were cleaved by this method {i.e.. no mtacl fusion protein 
was eluted from the column following thrombin digestion) and the resulting MB? was 
substantially purer than that from previous isolations as described in section b. That is, the 
cleavage of the fusion protein while immobilized (bound) to the protein A resin eliminated the 
minor quantities (^f non-sneci tu' proteins that were present ciiirinp The nrevion^K (l(^^crihc(^ 
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Cleavage of the fusion protein while immobilized to the column resin under mild 
conditions is a preferred method of protein isolation as this eliminates the need to remove the 
free IgG domain from the protein of mterest following protease digestion thereby decreasing 
the number of processing steps. 

5 

EXAMPLE 2 

The Use Of The Kanamycin-Resistance Gene In Place Of The Ampicillin- 
Resistance Gene Improves The Yield of Peripiasmic IgG Fusion Protein 

10 Experiments growing and inducing bacteria containing pMA2-TH-IgG-9 showed that 

high concentrations of ampicilHn were needed to maintain plasmid stability and to generate 
consistently high levels of the fusion protein. Glucose was eliminated from the growth media 
during the IPTG induction of pMA2-TH-IgG-9 to prevent the recombinant product from 
escaping from the peripiasmic space into the media. These problems may be directly 

15 associated with the use of a modified pMAL vector since induction of the unmodified 
commercial pMAL-p2 clones is noted by the manufacturer to be lethal and our 
immunoblotting control experiments showed a consistent, low level of transcription in the 
absence of IPTG. Conversely, the IgG portion of the fusion protein may contribute to this 
instability and leakage. 

20 Ampicillin-resistance is conferred by the presence of the p-lactamase gene product in 

the peripiasmic space. Secretion of proteins into the peripiasmic space occurs by a regulated 
transport process and may be rate-limiting for the production of secreted fusion products. 
Additionally, if the presence of the fusion protein makes the outer membrane unstable, the 
action of P-lactamase may be hindered. To eliminate these concerns, the ampicillin-resistance 

25 gene present on pMA2-TH-IgG was replaced by the kanamycin-resistance gene. The 
P-lactamase promoter was used to express the kanamycin-resistance gene. The 
kanamycin-resistance gene was chosen to replace the P-lactamase gene because 
kanamycin-resistance gene product is not secreted. 

The kanamycin-resistance gene was isolated from the eukaryotic vector pBK-RSV 

30 (Stralagene). Two micrograms of each pBK-RSV and pM-TH (described in Example la) 
were digested with BspHl to completion. The digestion products were concentrated by 
ethanol precipitation and run on a 1.5% low-melting temperature agarose (LMA) gel using 
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containing the kanamycin-resistance gene from pBK-RSV and the approximately 5.6 kb pM- 
TH vector fragment were excised from the gel and digested with Gelase (Epicentre 
Technologies) according to the manufacturer's protocol. Two hundred nanograms of purified 
vector DNA was combined with 200 ng of insen DNA in a final volume of 20 [i\ and ligated 
at 17^C in the presence of T4 DNA ligase and 1 mM ATP. The ligation products were used 
to transform competent AGl cells (Stratagene). Trunsformants were grown on plates 
containing 50 ^g/ml kanamycin. Four clones were picked and analyzed by restriction enzyme 
digestion. All four clones contained the kanamycin-resistance gene in the desired orientation. 
The resulting plasmid was called pMA2TH-Kan. 

To investigate whether replacement of the ampicillin-resistance gene with the 
kanamycin-resistance gene would lead to improved expression of periplasmic fusion proteins, 
the IgO domain was inserted downstream of the malE gene in pMA2TH-Kan to generate the 
pM-Col-K vector which encodes a MBP/IgG fusion protein. The thrombin site and the 
hydrophilic spacer (Arg-Arg) were replaced with a hydrophilic spacer resistant to proteolytic 
15 cleavage (the control spacer) in order to eliminate any possible degradation effusion protein 
during the quantitation experiments. This spacer was designed to give large proteins 
sufficient range of rotation around the Fc tail by including glycine and proline residues. 

Insertion of the linker encoding the spacer was accomplished as follows. pMA2-TH 
vector (2.0 j^g) was digested with Sad and Nhe\ to remove the thrombin site. The digested 
20 plasmid was purified by ethanol precipitation. The oligonucleotide pair comprising ColFI: 
5'-CGTTTAAAAAGAAACCGCGGGGCCCGGGTAC.3' (SEQ ID NO:47) and ColRl: 
S'.CCGGGCCCCGCGGTTTCTTTTTAAACGAGCTO^ (SEQ ID NO:48) was annealed by 
incubating 10 ^l of each oligonucleotide (100 ^M) in 100 ^\ of hybridization solution (10 
mM Tris-HCl, pH 8.0 and 50 mM NaCl) at 90°C for 10 min and then allowing the solution 
25 to cool to room temperature over a period of 2 hours. The KpnUNhe] digested IgG PGR 
product (described below in Example 3) was ligated to an excess of the annealed 
ColFl/ColRl oligonucleotide pair. 

Following the ligation reaction, the products were purified on a CHROMA SPIN-400 
column (Clontech) and ligated into the SacV Nhel-digcstcd pMA2TH-Kan vector. The ligation 
30 products were used to transform competent AGl cells. Transformants were selected by the 
ability to grow in the presence of 50 ^g/ml kanamycin. Kanamycin-resistant clones were 
screened for inserts and the ability to produce IpG as described above in Example ^ nn^-^; 
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unique isolates. To provide a control plasmid, the KpnVNhel IgG fragment was cloned into 
the ampicillin-resistant version of pMA2-TH using the above protocol and this plasmid was 
designated pM-Col-A( I -4). 

Figure 8 provides a schematic map of pM-Col-K. In Figure 8, the coding regions for 
5 the lac repressor (lacf^), MBP, the IgG fragment and kanamycin resistance gene are shown 
and the direction of transcription is indicated by the arrows; selected restriction sites are 
indicated. The rac promoter is also indicated (open arrow-head). The location of the junction 
region is indicated. 

The sequences comprising the junction region present in pM-Col-K and pM-CoUA are 
10 shown m Figure 9. Sequences comprising the carboxy-terminus of the MBP, the hydrophilic 
spacer, the control spacer and the amino-terminal portion of the IgG hinge region are 
indicated. The site of cleavage for the restriction enzymes Sad and Kpnl are indicated by the 
arrowheads. 

Bacteria harboring either pM-Col-K or pM-Col-A were induced by growth in the 

15 presence of IPTG in order to compare the amount of recombinant fusion protein expressed by 
each plasmid. The resulting fusion protein was affinity purified on protein A columns as 
described above. Briefly, overnight cultures (200 ml) of bacteria containing either pM-ColK 
or pM-ColA were grown in LB containing either 50 ^g/ml kanamycin or 120 fig/ml 
ampicillin, respectively. These overnight cultures were used to inoculate 2.8 liter flasks 

20 containing 1 liter of the appropriate LB media prewarmed to 37°C. The cultures were grown 
at 37^C in a shaker incubator with a rotation speed of 260 rpm until an OD^oo 0.6 was 
reached. IPTG was added to a final concentration of 0.4 mM and growth was continued for 
2.5 hoiu-s. The cells were harvested by centrifugation and lysed with lysozyme as described 
in Example 1. The lysates were adjusted to 450 mM NaCl, filtered and applied to 2 mis of 

25 prewashed immobilized Protein A in separate 10 ml disposable Affinica columns (Schleicher 
& Schuell). The columns were washed with 10 mis of binding buffer (20 mM Tris-HCK pH 
8.0, 450 mM NaCI, 5 mM EDTA) and the fusion protein was cluted with 100 mM NaCitrate 
buffer, pH 2.8. One milliliter fractions were collected and assayed for protein content using 
the BCA assay (Pierce). Fractions containing protein were pooled and the yield of fusion 

30 protein produced by the two plasmids was calculated. The results of these experiments are 
summarized below. 

Recombinant fusion protein yields were increased two fold using the 
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20 mg of fusion protein per liter of induced culture was isolated from the pM-Col-K vector. 
In comparison, only about 10 mg of fusion protein per liter of induced culture was produced 
by the pM-Col-A vector. In addition, growth rales were much slower for the pM-Col-A 
vector; bacteria containing this plasmid required nearly twice as much time to reach an OD^ 
of 0.6 as compared to bacteria containing the pM-Col-K vector. 

These results demonstrate that the use of the kanamycin-resistance gene is preferred 
lor the production of fusion protems which are secreted into the pertplasmic space in 
prokaryotic hosts. 



EXAMPLE 3 

Construction Of An IgG Affinity Domain 



The hinge and Fc portion of the human IgG molecule were isolated to provide DNA 
sequences encoding a protein domain which would allow for the afTinity purification of fusion 

15 proteins (i.e., an affinity domain). 

The IgG 1 -secreting human plasma cell line ARH-77 (ATCC CRL 1621) [Burk, et al. 
Cancer Res. 38:2508 (1978)] was used as the source of RNA for the isolation of cDNA 
clones encoding the hinge-CH2-CH3 (i.e., Fc) region of IgGl. ARH-77 cells were grown in 
RPMI 1640 medium (GIBCO) containing 10% FCS (GIBCO) in 125 ml tissue culture flasks 

20 (Fisher). The cells were allowed to grow to confluency and then were harvested by 

centrifugation at 300 x in 50 ml conical tubes (Fisher). The cell pellet was washed with 40 
ml PBS, pH 7.4 and resuspended in a final volume of 10 ml PBS, pH 7.4. 

Total cellular RMA was isolated from the ARH-77 cell pellet using a Total RNA 
Isolation Kit (Clonetech) according to the manufacturer's instructions. Briefly, 10 ml of 

25 denaturing solution (6 M guanidinum-HCI) was added to the pooled, washed cells (10 ml) and 
incubated for 10 minutes at room temperature. The following reagents were added in the 
stated order with gentle mixing: 1.0 ml 2M NaOAc pH 4.5, 10 ml water-saturated phenol and 
2 ml chloroform/isoamyl alcohol (29:1). The tube was shaken and stored on ice for 10 
minutes. The tube was then centrifuged for 15 minutes in an SS34 rotor (Sorvall) at 5000 x 

30 g at 4°C. The aqueous phase (supernatant) was removed and 10 ml of isopropanol was added 
tt) precipitate the nucleic acids. 

The tiihr was stored at -20°r^ ovemieht T^^^ RV "* M^rn nrlU-fr^^ h\ -^-nrrifntrntinn 
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solution and transferred to a 1.5 ml siliconized microcentrifuge tube (Fisher). The RNA was 
precipitated by the addition of 0.6 ml of isopropanol and incubation of the tube at -20°C for 1 
hour. RNA was pelleted gently in a Eppendorf microcentrifuge at 5,000 x ^ for 10 min at 
4^C, resuspended in 75% ethanol and stored at -20°C until needed. 
5 Poly A" RNA was isolated from the above total RNA preparation as follows. The 

total RNA was pelleted by centrifugation in a microfuge; the pellet was partially air-dried. 
The RNA was resuspended in 1,0 ml DEPC-trealed H-^O. Four hundred microliters were 
removed and the RNA was saved as an EtOH precipitate by the adding of 800 |il ethanol; the 
tube was stored at -20°C. The remaining 600 |il were brought to 955 ^1 by adding 355 |a] 

10 TE buffer, pH 8.0. Magnetic beads covalently linked to streptavidin and biotinylatcd oligo 
dT(„^ [Magna Poly AAA+ RNA Isolation Kit (Clonetech)] were then used to isolate the poly 
A' RNA. The RNA suspension (955 ^il) was mcubated at 65°C for 5 mm and 25 fil of 
sample buffer (provided in the kit) and 20 ^l of biotinylated oligo-dT were added and the 
reaction was cooled to 4^C. 

15 To this mixture, magnetic streptavidin beads were added and the mixture was 

incubated at room temperature for 10 min, according to the manufacturer's protocoL The 
tube was placed against a magnetic rack to allow separation of the polyA* RNA-magnetic 
streptavidin complexes. The complexes were washed with binding buffer (provided by the 
kit) 3 times while in the magnetic rack. Poly A' mRNA was released from the magnetic 

20 beads by washing the beads with 1.5 ml of H.O. The eluted poly A* RNA was divided into 
three aliquots and stored as ethanol precipitates at -20°C until used. 

First strand cDNA was synthesized using the poly A* RNA isolated above and a First 
Strand Synthesis Kit (Stratagene). Briefly, a single tube of the isolated poly A' RNA was 
pelleted by centrifugation at 12,000 x g in a microfuge at 4°C. The poly A* RNA was 

25 resuspended in 32 jil of DEPC-treated H.O and 3 ^1 of oligo dT primer (provided in the kit) 
was added. The tube was heated to 65°C for 5 min and allowed to cool slowly to room 
temperature. lOX buffer, RNAse Block, dNTPs and 20 units MMLV reverse transcriptase 
were added as indicated by the manufacturer (all reagents were provided in the kit) and the 
reaction was mcubated at 37°C for 1 hour. Completed reactions were stored in 3 |il aliquots 

30 at -20°C in a Stratacooler (Stratagene). 

A cDNA clone containing the hinge and Fc domains was isolated by PCR 
amplification of the first strand cDNA. Reagents for PCR amplification were obtained as 
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Bio-Labs), Pfu polymerase (Stratagene) and synthetic oligonucleotides (National Biosciences). 
Pfii polymerase was used to generate functional clones because of its increased fidelity 
compared to Taq polymerase. Temperature cycling was performed using a Perkin-Elmer 
thermocycler CN80I-015O). 

Figure 10 provides the nucleotide and amino acid sequence of the human IgGl 
hmge/Fc region (SEQ ID NOS:49 and 50, respectively). In Figure 10, selected amino acid 
residues are numbered to facilitate the discussion below (the initiator methionine located at the 
amino-terminus of the molecule in the V„ domain is residue number 1). 

The IgG hinge/Fe domain was amplified using three different oligonucleotides to prime 
separate PCR reactions at the 5' end of the hinge region. All three 5' oligonucleotides 
contain approximately 20 bases of sequence complimentary to the IgG hinge region linked to 
nucleotides comprising either a A^^oMI, Kpn\ or SqR restriction site at the 5' end of the 
oiigonucleotidc primer. The sequence of these three 5^ primers is shown in Figure 11. 

In Figure 11, the bases present in the three 5' primers which correspond to sequences 
15 located in the human IgGl hinge region are underlined- The location of the restriction sites 
present at the 5' end of the primers is indicated and the cleavage site is marked by an 
arrow-head. 

As shown in Figure 11, these three oligonucleotide primers introduce different amino 
acids at the 5' end of the hinge region. The IG5NG0 oligonucleotide (SEQ ID N0:51) 

20 contains the recognition site for NgoMl and introduces two arginine residues immediately 
upstream of the histidine residue located at amino acid position 225 in the human IgGl 
molecule. The IG5ARS oligonucleotide (SEQ ID NO:52) contains the recognition site for 
Sail and introduces two arginine residues at the 5' end of the hinge region (immediately 
upstream of the threonine residue located at amino acid position 226 in the human IgG I 

25 hinge region). The IG5KPN oligonucleotide (SEQ ID NO:53) contains a Kpn\ site and 
introduces a glycine residue in the hinge region (immediately upstream of the threonine 
residue located at amino acid position 224 in the human IgG hinge region). 

A single V oligonucleotide w^is used to prime the PCR reactions. This oligonucleotide 
was termed IG3NHE and comprises the following sequence: 

5'-CCCCCGCTAGCGTCATTTACCCGGAGACAGGGAGA-3^ (SEQ ID NO:54). The 
1G3NHE oligonucleotide contains an Nhe\ site to allow for the directional cloning of the 
isolated PCR products. Sequences present in the IG'NHF olieonuclcntidc wW.ch .^nHi -- * 
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The resulting 0.7 kb PGR products contain three variations of the hinge domain. They 
are designed to allow the naturally occurring proteolytic cleavage site of the hinge maximum 
exposure (see Figure 10). The SallA^G product was designed to be very hydrophilic; this 
product is generated using primers comprising SEQ ID NOS:52 and 54. Two arginine 
residues encoded by the 5' primer (SEQ ID NO:52) were used to replace the naturally 
occurring Thr(224) and His(225) to make the region more hydrophilic. In the Kpn\-lgG 
fragment a glycine residue encoded by the 5' primer (SEQ ID NO:53) replaces the naturally 
occurring Lys(223) amino acid to allow for maximum rotation of the protein of interest and 
attached cndocleavage site, Thr(224) and His(225) were not disturbed. The Kpnl-lgG 
fragment is generated using primers comprising SEQ ID NOS:53 and 54. 

In the vV^oI-IgG fragment, the threonine (at position 224) was replaced with an 
arginine residue to make the hinge region more hydrophilic. A glycine codon (GGG) can be 
created by using a ciomng linker that termmates with GG and has an Ngol compatible 5' 
overhang to provide additional flexibility. 

Thermocycling was performed using the following conditions: 95^C for 1 min 30 sec, 
37°C for 1 min and 72°C for 2 min sec for 30 cycles. Following amplification, each PGR 
product was isolated on a low-melt agarose gel in order to remove primers and incomplete 
products. Nearly 100% recovery of products from the gel was accomplished using Gelase 
(Epicenter), an enzyme that degrades agarose to saccharides, following the manufacturer's 
protocol. After treatment with Gelase, the PGR fragments were isolated by EtOH 
precipitation. The IgG fragments (i.e., the NgoMl/Nhe]-, SaFl/Nhel- or Kpnl/Nhel-lgG 
fragment) were digested with A^^oMI and Nhel, Sail and Nhel, or Kpnl and Nhel then inserted 
into the desired vectors digested with the appropriate restriction enzymes. 

EXAMPLE 4 

Construction Of Fusion Proteins Expression 
Vectors For Use In A Variety Of Host Cell Types 

To produce the fusion proteins comprising the hydrophilic spacers of the present 
invention in any desired host cell, expression vector constructs have to be made that will 
satisfy the various genetic requirements of the system. Genes encoding the protein of interest 
need to be inserted into vectors containing the appropriate transcription tmd translation 
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partner {e.g., the hinge and Fc domain of IgG) via the chosen hydrophihc spacer and 
endoprotease site. This cnnfiguration may be achieved by modification of any commercially 
available expression vector using standard techniques of molecular biology. 

Synthetic linkers encoding the desired hydrophilic spacer and endoprotease site artf 
used to join the 3^ end of the gene encoding the protein of interest to the 5' end of the IgG 
gene fragment. This construction must maintain a single open reading frame so that the 
hydrophilic spacer, endoprotease site and affinity domain are properly expressed (i.e., no 
premature stop codons are generated and no shifts in readmg frame occurs). Examples are 
herein provided demonstrating how such constructs are made m order to generate vectors 
suitable for expression of fusion proteins in prokaryotic ceils and eukaryotic cells such as 
mammalian cells or insect cells. 

All of these exemplary vectors use the Kpn/Nhel-lgG fragment (described above in 
Exairiplc 3) as the carboxy-terminal fusion partner, but it is noted that the invention is not 
limited to the use of this particular affinity domain. These exemplary vectors do not specify a 
particular endoprotease site or a particular hydrophilic linker sequence to be used. These 
elements are selected based on the amino acids present in naturally occurring carboxy 
terminus of the protein of interest and the proteolytic susceptibility as discussed in the 
Description of the Invention above. 

a) Prokaryotic Expression Vectors: Construction Of pTVkIg-1 

The expression vector pTVklg-1 was constructed to allow the expression of fusion 
proteins containing the hydrophilic spacers in prokaryotic hosts such as E. coli. This vector 
contains the strong tac promoter to allow for high level transcription and the lacO operator to 
allow for transcriptional regulation in the presence of the lac repressor. The lacY^ repressor 
gene encoded by pTVklg-1 allows for regulation of expression in any £. coli strain. An 
optimized ribosome binding site is present to allow for efficient initiation of translation. 

To construct pTVldg-1, the commercially available vector pSE380 (Invitrogen) was 
modified as follows. The Superlinker cloning site of pSE380 was removed by digesting 
pSE380 with Nco{ and HincRW. A polylinker containing multiple restriction sites was mserted 
by ligation of the following annealed oligonucleotide pair: VNHl: 

5--CATGGACTGAAAGCTTGACGGTACCTGAGCTAGCT-3' (SEQ ID NO:55) and WH2: 
.V-AGCTAGCTAGCTCACiGTACCGTCAAGCTTTCAGTC-r fSEO ID Th- 
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this pair of oligonucleotides is annealed, a 5' overhang compatible with Ncol ends and a 3' 
overhang compatible with HindlU ends are created. Generation of the modified vector is 
confirmed by restriction analysis (absence of deleted Superlinker sites and presence of Ncol, 
IlindiWy Kpnl and Nhel) and is designated pST-1. 
5 The sequences encoding the hinge and Fc (C„2 and C^3) regions of the human IgGl 

molecule contained on a 0.7 kb KpnllNhel fragment (described in Example 3) was inserted 
into pST-1 as follows. pST-1 was digested with Kpnl and Nhel and the 0.7 kb Kpn\fNhe\ 
IgG fragment was ligaled into the digested pST-1 vector using a 2:1 ratio of insert to vector. 
A clone containing the IgG insert was confirmed by restriction analysis and was designated 
10 pSTIg-1. A map of pSTIg-1 is provided in Figure 12. 

In Figure 12, the location of the trc promoter, the Kpnl/Nhel IgG fragment, the 
ampicillin-resistance gene and the /acP gene are indicated; selected restriction sites are also 
indicated. 

To provide a signal sequence which is efficiently utilized by prokaryotic cells, a signal 
15 peptide sequence derived from the bacterial phosphatase (pho) gene was then inserted into 

pSTIg-1. This improved signal peptide sequence contains a Ncol site at the ATG codon and a 
Hindlll site at the 3* end. These engineered sites allow the insertion of the signal sequence 
into the expression vector and allows the addition of sequences encoding the desired protein 
of interest at the 3' end of the signal peptide sequence. When secreted eukaryotic proteins of 
20 interest are to be expressed in prokaryotic hosts, the naturally occurring signal sequence is 
deleted from the eukaryotic gene and is replaced with the modified pho signal sequence. 

The pho signal sequence was generated by annealing the following four 
oligonucleotides together, pho Fl: 5'-CATGAAACAAAGCACTATTGCACTGGCT GTC-3' 
(SEQ ID NO:57). pho F2: 5'-TTACCGTTACTGTTTACCCCTGTG ACAAA-3^ (SEQ ID 
25 NO:58). pho Rl: S^AGCTTTTGTCACAGGGGTAAACAGT AACGGTAAGACAGC-3^ 
(SEQ ID NO:59). pho R2: AGTGCAATAGTGCTTT GTTT-3^ (SEQ ID NO:60). 
Oligonucleotides pho F2 and pho R2 contained 5 '-phosphate groups; pho Fl and pho R2 were 
nonphosphorylated. 

Figure 13 shows the double-stranded sequence generated by annealing of the four pho 
30 oligonucleotides; the ammo acid sequence encoded by the annealed oligonucleotides is shown 
below the nucleotide sequence; the amino acid sequence of the pho signal sequence is also 
listed m SEQ ID NO:61. 
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Annealing was accomplished as follows. Each of the four oligonucleotides was 
suspended at a concentration of 40 in 50 mM Tns-HCl, pH 8.0, 20 mM KCl and I mM 
EDTA. Twenty-five microliters of each oligonucleotide solution was combined and heated to 
WC and then allowed to slow cool to room temperature over 120 min The reaction was 
then placed at \ TC and MgCl^ (10 mM final concentration), ATP (1 mM final concentration) 
and T4 DNA ligase was added. The ligation reaction was incubated for 1 hour at room 
temperature and then stored at -20°C. 

To insert the pho signal sequence into pSTIg-1, the vector was digested with A^col and 
HmdlU. A 3 fold molar excess of the annealed signal sequence and the digested pSTIg-1 
vector are ligated together in a final reaction volume of 20 |ii using T4 DNA ligase at I7X. 
Insertion of the signal sequence into pSTIg-1 was confirmed by restriction analysis (lack of 
the Ncol site). The resulting vector containing the pho signal sequence and the IgG fragment 
was designated pT Vkig-1. A map of pTVkIg-1 is shown in Figure 14. 

In Figure 14, the location of the trc promoter, the pho signal sequence, the junction 
15 region, the Kpnl/Nhel IgG fragment, the ampicillin-resistance gene and the lac repressor 
(/acl*') gene are indicated; selected restriction sites are also indicated. 

As shown in Figure 15, pTVk]g-l contains a Hindlll and a Kpnl site between the pho 
signal peptide sequences and the IgG sequences to allow for the insertion of sequences 
encoding the desired protein of interest. Additionally, when desired, sequences encoding a 
20 hydrophilic spacer and endoprotease site may be inserted into the junction region (see Figures 
14 and 15) between the 3' end of the desired protein and the 5' end of the IgG nucleic acid 
sequences. 

The pTVkJg-l vector w^as engineered to provide a number of advantages for the 
expression of fusion proteins. The pho signal peptide or secretion sequence is followed by a 

25 stop codon so that the downstream IgG sequences are not expressed in clones which lack an 
insert encoding the protein of interest (see Figure 15). This design allows for the expression 
screening of clones containing sequences encoding the protein of interest inserted into the 
HindiU and Kpnl sites of pTVkIg-1 using anti-Fc antibodies. Only those clones containmg 
sequences encoding the protein of interest correctly joined to sequences encoding the 

30 hydrophilic spacer and affinity domain w^ill result in the production of the Fc portion of the 

IgG molecule. Sequences encoding the protein of interest can be inserted into the HincRU and 
Kpnl sites of p TVklg-l using a variety of techniques includinL- the n^t- of link-fN nn;^ >r>T..r 
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restriction sites in a PGR. Additionally, the Hindlll site in pTVkIg-1 can be made blunt by 
incubation of the digested plasmid with the Klenow fragment of E. coli DNA polymerase and 
dNTPs. This fill-in reaction will produce a blunt end which is in frame with the desired 
codons in the protein of mterest. 
5 The sequences encoding the IgG affinity domain were designed so that the Kpnl site at 

the 5* end of the IgG sequences introduces a glycine residue into the hinge region of the IgG 
sequences. This provides flexibility to the hydrophiiic region allowing for enhanced 
accessibility of this region of the fusion protein to the endoprotease. The use oi Nhel as the 
restriction site at the 3' end of the IgG sequence provides an additional stop codon to prevent 
10 any read through translation. 

b) Eukaryotic Expression Vectors 

i) Mammaiian Expression Vectors: Construction 
Of pTVMam-Ren 

15 Numerous eukaryotic expression vectors are currently available. Most provide for high 

levels of constitutive transcription from mammalian enhancer/promoter sequences and contain 
appropriate transcription termination and polyadenylation signal sequences. Sequences which 
allow for the replication of the vector in appropriate mammalian cell lines (e.g., the SV40 
origin of replication and COS cells) may be present. Sequences encoding a selectable marker, 

20 such as the neo gene, may be utilized to allow for the isolation of stable mammalian cell lines 
expressing the vector sequences. The pcDNA3 vector (Invitrogen) was used to illustrate the 
modification of a mammalian expression vector to allow for the production of the improved 
fusion proteins of the present invention. 

The Nrul and Kpnl sites are deleted from pcDNA3 to prepare for the construction of 

25 pTVMam-Ren. The A^rzil site is eliminated by ligating a self-annealing 8-mer that codes for 
the Not\ recognition sequence [GCGGCCGC (NEB)] into AVwl-digested pcDNAl. The Kpnl 
site is eliminated by treating the modified pcDNA3 vector with Kpnl followed by treatment 
with T4 DNA polymerase at 25°C in the presences of dNTPs. Religation {i.e.. 
circularization) of the blunted vector will result in the loss of the Kpnl site. 

30 The modified pcDNA3 vector is then ready for insertion of the KpnllNhel IgG 

fragment (described in Example 3) into a unique Xhal site located at the 3' end of the 
multiple cloning site (re, poly linker) in pcDNA3. The Nhe\ end present on the KpnMNhe] 
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pcDNA3 vector (this ligation will destroy both the Nhe\ and Xbal sites). The Kpn\ end on 
the IgG fragment is connected to the upstream Xba\ overhang through the use of a linker 
which contains a single-stranded extension at the 5^ end which is compatible WxWv Xbal ends 
and a single-stranded extension at the 3' end which is compatible with Kpn\ ends. The linker 
sequences also encode a hydrophilic spacer and an endoprotease site. 

A suitable linker (termed the 5^ linker) is formed by annealing the oligonucleotide pair 
XKf2 and XKr2 (SEQ ID NOS:62 and 63, respectively) together to generate the following 
double-stranded sequence: 



5^CTAGCTGATCGCGAAAGAAGCTGCCGTTCCACCTGCTGGTGTACGGTAC-3'{XKf2) 
3^GACTAGCGCTTTCTTCGACGGCAAGGTGGACGACCACATGC-5'(XKr2) 

The above linker is compatible with Nhel ends at the 5' and with Kpn\ ends at the 3' 
end; this allows the linker to be inserted into the upstream Xba\ site of pcDNA3 and allows 
the linker to be ligated to the KpnllNhel IgG fragment through the Kpn\ ends; the Nhel end 
present on the IgG fragment is capable of ligation into the downstream Xbal site on pcDNA3. 
The above linker encodes a hydrophilic spacer comprising the sequence Arg-Lys-Lys (SEQ 
ID NO: 17), a penultimate enhancer (leucine) and the recognition site for the endoprotease 
renin fPro-Phe-His-Leu-Leu-VaUTyr (SEQ ID N0:3)] 

The 5' linker used to join the IgG fragment to the vector and provide the spacer and 
endoprotease site may be designed such that additional arginine, lysine and tyrosine residues 
may be placed upstream of the endoprotease site. Insertion of the IgG fragment into the Xbal 
site in the above described manner allows the remaining sites in the multiple cloning site to 
be utilized for insertion of sequences encoding the gene of interest. The above-described 
linker contains a Nrul site (TCGCGA) which produces a blunt end upon digestion. The 
resulting blunt end has CGA as its first three nucleotides which encodes the first arginine 
residue of the hydrophilic spacer. The sequence following this CCiA can be varied to 
generate the desired hydrophilic linker and endoprotease site. 

To construct pTVMam-Ren, 2 ^g of pcDNA3 is digested with A^i^il. The 5' linker is 
formed by annealing equimolar ratios of unphosphorylated XKf2 and XKi2 oligonucleotides 
at a concentration 10 in a total volume of 100 ^l as described in Example la. The 
annealed XKf2/'XKr2 oligonucleotide pair is ligated in excess HO fold) tn th- Kpr.]^Whr] ^nC. 



wo 97/28272 PCT/US9 7/0 1470 

compatible Xba\ overhang on the 5' linker) are removed by digestion with Nhel and passing 
the reaction products through a CHROMA SPIN 100 column (Clontech). The purified insert 
is then ligated to 200 ng of A'/j^^l -digested pcI)NA3 vector using a 3:1 insert: vector ratio. 
The ligation products are used to tr^insform competent E. coli cells. Ampicillin-resistant 
5 colonies are screened lor inserts in the proper orientation by restriction analysis (a double 
digestion using with Apa\ and Nrul). Clones having inserts in the proper orientation are 
isolated and their plasmids are purified. The exemplary vector, pTVMam-Ren is shown in 
Figure 16. 

In Figure 16, the location of the cytomegalovirus (CMV) promoter, the multiple 

10 cloning site, the junction region, the KpnVNhel IgG fragment, the bovine growth hormone 

polyadenylation site ("BGH poly A"), the SV40 origin of replication, the neomycin-resistance 
gene and the ampicillin-resistance gene are indicated; the direction of transcription is indicated 
by the arrows, selected restriction sites are indicated. 

Figure 17 provides a diagram showing the sequences comprising the junction region 

15 which contains the hydrophilic spacer and renin endoprotease site present on pTVMam-Ren; a 
portion of the sequences encoding the IgG hinge region are also shown. Sequences encoding 
the protein of interest are inserted upstream of the sequences encoding the hydrophilic spacer 
(See Figure 18). The location of the Nrul and Kpnl restriction sites are indicated by the 
arrowheads; the location of the cleavage site for renin is indicated by the arrow pointing 

20 between the adjacent leucine residues. 

As shown in Figure 18, vectors constructed in this manner have a multiple cloning site 
available for ligation to the 5' end of the sequences encoding the protein of interest. Figure 
18 depicts a portion of the pTVMam-Ren vector containing the CNfV promoter, the multiple 
cloning site (indicated by the cluster of restriction sites), the junction region and the IgG 

25 fragment. The Nrul site present in the junction region produces a blunt end for the ligation of 
the 3' end of the sequences encoding the protein of interest. The sequences encoding the 
protein of interest must contain an ATG initiation site and have a full codon represented at its 
3' blunt end to enable the production of IgG fusion molecules in mammalian cells using the 
pTVMam-Ren vector. 
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ii) Baculovirus Expression Vectors: Construction 
Of pTVBac-klg 

A number of vectors are commerciaJly available for the expression of fusion proteins 
in insect cells including the pBlueBacHis vectors and the pVL1393 vector (Invitrogcn). The 
pVLI393 vector is modified to permit the expression of the fusion proteins of the present 
invention as follows. Several restriction sites are present in the pVL1393 vector that allow 
for efficient cloning of inserts. The KpnV Nhel IgG fragment (see Example 2) and a desired 
linker sequence are inserted into the pVIJ393 vector while maintaimng the availability of the 
multiple clomng sites for ligation to the y end of sequences encoding the gene of interest. In 
this example the linker encodes a hydrophilic linker comprising the sequence Arg-Lys-Lys- 
Lys (SEQ ID NO;24), a penultimate enhancer (Leu) and a renin endoprotease recognition site. 
However, other combinations of hydrophilic spacers and endoprotease sites may be employed. 

pVL]393 was modified generate pTVBac-KIg as follows. Due to the presence of 
Kpn\ sites within the vector (pVL1393), linkers are used to clone the Kpnl/Nhel IgG fragment 
into the Bgfll site at the 3' end of the multiple cloning site in pVL1393. The KpnVNhel IgG 
fragment is isolated from the pM-Col-K vector (described in Example 2). Digestion of 
pM-Col-K with Kpnl and Nhel releases the KpnVNhel IgG fragment which is then purified on 
a LMA gel as described in Example 3. This IgG fragment has been previously shown to 
encode authentic and functional form of the Kpn\ IgG molecule. 

Synthetic linkers (a 5' and a 3' linker) are used to insert the IgG fragment into the 
BglW site of pVL1393. The 5' linker encodes the hydrophilic spacer and endoprotease site 
(renin); this linker has single-stranded extension at the 5^ end which is compatible with a 
BgRl overhang and a single- stranded extension at the 3^ end which is compatible with a Kpnl 
overhang. The 5" linker is used to join the IgG fragment via the Kpnl site to the upstream 
Bgni overhang on the digested pVL1393 vector. A 3' linker which comprises a Nhel 
overhang on the 5* end and a BglW overhang on the 3^ end is used to join the IgG fragment 
via Nhel site to the downstream Bglll overhang on the digested pVL1393 vector, 

A suitable 5' linker is created by annealing together the BKRENf (SEQ ID NO:64) 
and BKRENr (SEQ ID NO:65) oligonucleotide pair (see Figure 22) to each other at a 
concentration of 20 (in 20 mM Tns-HCl, pH 8.0, 50 mM NaCl, ImM EDTA), by heating 
to 95^C for 5 minutes and slow cooling to room temperature over 60 minutes. A suitable V 
linker is formed by annealing the NBf fCTAGC CCX C'C (SEQ ID NO:66)l and NR- 



wo 97/28272 PCT/US97/0 1 470 

oligonucleotides are hybridized at a concentration of 20 (20 mM Tris-HCl, pH 8.0, 500 
mM NaCI. ImM EDTA) by heating to 80°C for 5 minutes and slow coohng to 20°C over 60 
minutes. The use of this 3' linker converts the Nhe] site present at the 3' end of the 
KpnllNhel IgG fragment into an Bgl\\ overhang while eliminating the Bgl\\ site) 
5 The 5' and 3' non-phosphoryiated linkers are then ligated to the IgG fragment and the 

hnkers are removed with a CHROMA SPIN-400 column pre-equilibrated with 500 mM NaCl. 
The purified fragment contains BglW compatible ends and is ligated to into Bgl\\ digested 
pVL1393 vector. The ligation products are used to transform competent bacterial cells [eg., 
JMlOl (Stratagene)]. Clones are screened for the presence of the IgG insert in the desired 
10 orientation by restriction enzyme analysis. The desired clone can be identified by a double 
digestion with Nhe\ and EcoR\\ clones containing a single copy of linker- adapted IgG 
fragment in the proper orientation will produce a 747 bp NheVEcoRl fragment. A map of the 
pTVBac-klg vector is shown in Figure 19. 

In Figure 19, the location of polyhedron (PPH) promoter, the muhiple cloning site, the 
15 junction region, the IgG fragment, the ampicillin-resistance gene and recombination sequences 
are mdicated. The recombination sequences are sequences that flank the polyhedron gene in 
the wildtype AcMNPV which are used in the pTVBac-klg transfer vector to permit a 
homologous recombination event to generate a recombinant virus which contains the PPH 
promoter and the inserted gene sequences; this recombination event is achieved by 
20 cotransfection of the pTVBac-kJG vector with AcMNPV DNA (Invitrogen) into suitable host 
cells such as Sfp cells (Invitrogen). 

Figure 20 provides a diagram showing the sequences present in the junction region of 
pTVBac-klg- The hydrophilic spacer is boxed and comprises the sequence Arg-Lys-Lys-Lys 
(SEQ ID NO:24); the penultimate enhancer (leucine) is indicated and the renin site is 
25 enclosed in a box and the site of cleavage is indicated by the arrow pointing between the 

adjacent leucine residues. The first 6 amino acids of the IgG fragment is shown. Fhe site of 
cleavage for Bgl\\^ Nrul and Kpn\ is indicated by the arrowheads. 

Figure 21 depicts a portion of the p i VBac-klg vector containing the PPH promoter, 
the multiple cloning site (indicated by the cluster of restriction sites), the junction region and 
30 the IgG fragment. 

As shown in Figure 21, pTVBac-klg retains most of the cloning sites present in the 
multiple cloning site of the original vector; these sites are available for insertion of the 5' enti 
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must be provided by the sequences encoding the protein of interest. Nrul digestion of 
pTVBac-lcIg provides a blunt end for the ligation of the 3^ end of the inserted gene while 
preserving the first arginine residue of the hydrophilic spacer. Several variations of 
hydrophilic spacers and endoprotease sites can be engineered using the approach described 
5 above to create specific vectors for the production of fusion proteins which can be isolated 
using IgG affiniry^ chromatography (i.e., use of Protein A and/or G resins). 

For example, the following oligonucleotide pair can be armealed to provide a thrombm 
site within the hydrophilic linker in pTVBac-klg. The thrombin linker is formed by the 
TBKF (SEQ ID NO:68) and TBKR (SEQ ID NO:69) oligonucleotide pair. 

10 Figure 22 provides a diagram showing the annealed TBKF and TBKR ohgonucleotides 

which comprise the thrombin linker. The annealed BKRENf (SEQ ID NO:64) and BKRENr 
(SEQ ID NO:65) oligonucleotide pair which comprises the renin linker used in the 
construction of pTVBac-kig (described above) is also shown in Figure 22. Both the thrombin 
linker and the renin linker have compatible ends for insertion into the BgUl and Kpnl sites of 

1 5 the vector. 

Vectors constructed with these specific linkers are suitable for the expression of 
sequences encoding proteins of interest which are not susceptible to thrombin and renin 
cleavage, respectively. The desired gene is cloned into the appropriate vector using the 
techniques described above. The vector contains sequences that allow for replication and 
20 identification of the desired clones in a commercially available bacterial strain such as JMlOl 
(Stratagene), 

Once a construct comprising the sequences encoding the desired fusion protein is 
isolated, the bacterial cell harboring this construct is grown, isolated and purified by standard 
techniques (e.g., cesium chloride density gradient centrifugation). The baculovirus transfer 

25 vector (eg., pTVBac-klg) is used with AcMNPV DNA to co-transfect Sf9 cells to generate a 
recombinant baculovirus expressing the fusion protein using procedures known to the art. For 
example, procedures for transformation, growth and selection of insect cells expressing 
recombinant ner\^e groulh factor (NGF) using baculovirus vectors have been described [U.S 
Patent No. 5,272,063, the disclosure of which is herein incorporated by reference]. 

50 Briefly, log phase Spodoptera frugiperda (Sf9) cells (Invitrogen) at a denshy of 2.0 x 

10*' cells/ml are allowed to attach to 60 mm' tissue culture plates. One microgram of 
linearized AcMNPV DNA and 3 ug of purified pTVDac-KlRG plasmid cnntainint? the de^^ireM 
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(Invitrogen). Twenty microliters of Insectin"'"'^ liposomes (Invitrogen) are then added and the 
mixture is vortexed (this comprises the transfection mixture). All medium is removed from 
the attached growmg msect cells (60 mm' plate) and the transfection mixture (1 ml) is added 
to the cells which arc then incubated on a rocker platform for 4 hours at room temperature. 
5 The virus-containing medium is then removed and 1 ml of fresh protein supplemented (3.62 
g/500 ml) Graces medium (Invitrogen) is added and the plate is incubated at 27°C in a 
humidified environment for 48 hours. Fresh medium is added and within 4 days of 
incubation of the cells at 2TC. culture supernatants are harvested and titrated on confluent 
monolayers of Sf9 cells. Plaques exhibiting no occlusion bodies are picked and replaqued to 

10 generate polyhedron negative recombinant viruses. Large scale high titer virus stocks (10^-10^ 
pfu/ml) are prepared from several isolated recombinant plaques to insure that a high 
producing virus is isolated. 

Production of desired proteins with tlic liicred recombinant virus is conducted as 
described [Chan H.W., supra]. Briefly, insect cells (Sf9, Invitrogen) are propagated in serum 

15 free medium XL-400 (JR Scientific, Woodland CA) at 2TC to a density of 2.0 x 10^ per ml. 
The medium is removed and replaced with serum-free medium containing plaque-purified 
recombinant virus (using a multiplicity of infection or "MOI" of 0.01-5.0). The medium 
containing the recombinant virus is removed after 1 hour of incubation at 27°C and is 
replaced with 5 volumes of fresh medium to give a density of 0.4 x 10^ cells/ml. When the 

20 fusion protein encoded by the recombinant baculovirus contains a secretion signal at its 

amino-terminus, the log phase cells are infected at a MOI from 0.01 to 0.2 pfu/cell and the 
media is harvested 3-4 days post infection. A higher MOI (5.0 p.f.u./cell) is used to infect 
cells when the fusion protein does not contain a secretion signal and the cells are harvested 
before the 72 hour post infection time point. 

25 To harvest fusion protein from the recombinant baculovirus-infected cells, the cells are 

removed from the plates and collected by centrifugation. The cell pellet is resuspended in 
one-fiftieth (1/50) the original culture volume in binding buffer (50 mM Tris-HCl, pH 8.0. 
450 mM NaCl and 5 mM EDTA) and subjected to repeated cycles of freezing and thawing (at 
-70°C and 42°C» respectively). Insoluble debris is removed by centrifugation of the mixture 

30 at 10,000 X ^ in an SS34 rotor (Sorvall). DNA present in the sample is sheared by passing 
the supernatant through an 18 gauge needle. Lysates are passed though a 1 micron filter to 
remove any debris that may clog the afTinity matrix and the fusion protein is isolated by 
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E>L\MPLE 5 

Generation Of Authentic Protein By Carboxypeptidase Digestion 



The enzymatic removal of carboxy-terminal amino acids from cleaved fusion proteins 
to generate authentic proteins is accomplished by taking advantage of the substrate 
specificities of the various carboxypeptidases. The most extensively characterized 
carboxypeptidases are the mammalian metailo carboxypeptidases A and B (CPA and CPB) 
and the serine carboxypeptidase Y from yeast (CPD-Y). The inability of CPA and CPB to 
remove carboxy-terminal arginines or prolines [Ambler A. P., Methods EnzymoL, 25:271 
(1972)] played a key role in the design of the hydrophilic spacers described in the present 
invention. Carboxypeptidase Y has a broad specificity (i.e., they can remove a wide variety 
of amino acids) including the ability to remove proline. 

Carboxypeptidases immobilized to either a diffusional or a limited diffusional matrix 
are employed. Ihese enzymes are immobilized on insoluble supports. Insoluble supports 
have gained popularity in immobilized enzyme applications because immobilized enzymes 
remain active for long periods of time and are recoverable from the reaction mixture (reduces 
cost). Most commercially available matrices comprise synthetic supports produced by the 
polymerization of functional monomers to produce an interconnected beaded matrix which is 
suitable for affinity chromatography and most enzymatic applications. The critical 
characteristics of a particular matrix to consider are 1) particle size (average dimension of a 
single bead); 2) pore size (exclusion limit); surface area (M2/gram) and 4) mass transfer effect 
(diffusional or flow through/non-diffusional). I he mass transfer effect of a matrix depends on 
the nature of the pores on the beads. If the pores connect through the beads and a solution 
can flow through the pores, the matrix is considered non-diffusional. If the pores of the beads 
are dead ends and solution cannot flow through the pores, the matrix is considered to be 
diffusional. 

Diffusional matrices are ideal for CPA and CPB digestions as employed in the 
methods of the present invention While not limiting the present invention to any particular 
theory, an explanation of the interaction between the fusion proteins and immobilized CPA or 
CPB matrices is provided. The majority of the enzyme activity is located within the large 
surface area of the pores when CPB is immobilized to Sepharose 4B (Pharmacia). When a 
released protein of interest containmg an exposed hydrophilic spacer at the carboxv-terminns 
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and are acted upon by the immobilized enzyme. Because the diffusion process is slow 
compared to the enzymatic reaction, the probability that the multiple arginine and/or lysine 
residues of the spacer will be removed while the protein of interest is in the pore is high, 
This is advantageous when CPA or CPB digestions are to be performed as spacer designs 
5 which require treatment with these enzymes require that the reaction go to completion in 
order to generate authentic protein of interest. 

In contrast, CPD-Y digestions, which are used to remove proline residues present in 
the endoprotease sites used in Level 3 designs cannot utilize diffusional matrices. Because 
CPD-Y can effectively remove all amino acid residues (present in any combination) given 

10 enough time and proper reaction conditions, digestion with CPD-Y must be controlled to 
prevent the removal of residues present on the authentic protein of interest. The approach 
chosen to control the extent of digestion with CPD-Y was to use CPD-Y immobilized to a 
limited diffusional matnA. U.S. Patent Nos. 3,862,030 and 4,169.014 describe suitable limited 
diffusional matrices (the disclosure of these patents is herein incorporated by reference). 

15 Enzymatic incubation times are adjusted by varying the flow rate over the thin limited 
diffusional matrix containing immobilized CPD-Y. 

The experiments described below were designed to confirm that control of flow rate 
could be used to control the extent of CPD-Y digestion. Two forms of immobilized 
carboxypeptidase were used. The first form comprised a commercially available immobilized 

20 carboxypeptidase Y in which the immobilization media comprised 4% beaded agarose. 

Carboxypeptidases cross-linked to agarose are commercially available and are commonly used 
for the carboxy-tcrminal sequencing of proteins; these protein sequencing protocols involve 
the sequential removal of amino acids from the carboxy-terminus of proteins. CPD-Y 
immobilized to 4% agarose has been traditionally used for the determination of the amino 

25 acid sequence of peptides and proteins (carboxy-terminal sequencing). The molar 
concentration of CPD-Y' used in sequencing reactions is kept low compared to the 
concentration of substrate (1:1000 to 1:400) in order to promote a non-uniform digestion 
which allows a determination of the order of removal of the amino acids. 

In contrast, high enzyme to substrate ratios [greater than 1:10 ( enzyme :substrate)] were 

30 used in the following experiments in order to promote uniform digestion of the substrate. 

Flow rates were adjusted to limit the amount of time that the substrate was exposed to the 
immobilized enz>'me in order to limit the extent of digestion by CPD-Y. 
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The following reagents were used in the experiments described below: immobilized 
carboxypeptidase Y, 12.5 umts/ml cross-linked 4% beaded agarose (Pierce; a diffusional 
matrix); 22 mm x 0.5 mm Acti-Disk, GTA-activated (Arbor Technologies, Inc, Pme Brook, 
NJ; a limited diffusional matrix); carboxypeptidase Y, 100 units/mg (Sigma); Hytach Peptide 
5 Column (C-18, 2 B nonporous matrix, 105 x 4.6 mm) [Hewlett-Packard]; unless otherwise 
stated, all other chemicals referred to below were obtained from Sigma. 

A synthetic 12 residue peptide containing mostly hydrophilic residues was synthesized 
(Analytical Biotechnology Services, Boston, MA). This peptide comprises the following 
sequence: Ala-Leu-Lys-Asp-Ala-Gln-Thr-Asn-Ser-Ser-Ser-Phe (SEQ ID NO:70); this peptide 
10 IS referred to as the control peptide. This peptide represents the carboxy-terminal control 

peptide that would be generated by digestion of authentic MBP by Staphylococcus aureus V8 
(see Example i). As described below, this peptide proved to be an excellent substrate for 
controlled carboxypeptidase digestion experiments. 

Ten milliliters of a solution containing 100 ^g/ml of the control peptide in PBS (pH 
15 6.5) was repeatedly passed through 1 ml of a matrix consisting of CPD-Y immobilized on 
cross-linked agarose (12.5 U/ml, Pierce). The 10 ml sample was applied to the top of a 1.0 
cm Affinica column (Schleicher & Schuell) containing I.O ml of the immobilized CPD- 
Y/agarose and allowed to flow by gravity. An aliquot (100 ^1) of the digestion reaction were 
taken and the remaining sample was reapplied to the column as the previous sample 
approached the matrix {i.e., the sample was recirculated through the column). Aliquots (100 
^1) were removed after 1, 5, 1 1 and 17 passes over the immobilized CPD-Y/agarose column; 
20 ^l of these aliquots were analyzed by high pressure liquid chromatography (HPLC) using a 
TFA/acetonitrile gradient and a C- 18 column (Hytach protein column). A Shimadzu HPLC 
spectrophotometer equipped with a UV recorder was adjusted to analyze the digestion 
25 products at 210 nm. The following buffers were used for the HPLC analysis. Buffer A 
consisted of 1.125% TFA and Buffer B consisted of 1.0% TFA in 80% acetonitrile. To 
achieve separation of the peptides produced by CPD-Y digestion of the control peptide, a 
gradient of 0-30% B buffer was applied over a 30 minute period. 

Figures 23-25 depict chromatographs generated by the Shimadzu HPLC 
spectrophotometer. In Figures 23-25, the numbers appearing over a given peak (OD.,o) 
represents the retention time (in minutes) for a given peptide. Figure 23 shows the retention 
time for the undigested control peptide: the large peak seen at 22 4 minutes cnrrcsnonH^ thr 
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generated by passing (he control peptide once (Fig. 24A) or five (Fig. 24B) times over the 
CPD-Y/agarose column. In Figure 24A, the peaks labelled "C" correspond to full-length 
control peptide; the peaks labelled (in bold) 1, 2, 3. 4, 5, and 6 correspond to the control 
peptide minus I, 2, 3, 4, 5, or 6 amino acids, respectively. 
5 Figure 25 shows the retention times for the peptides generated by passing the control 

peptide eleven (Fig. 25A) or seventeen (Fig. 25B) times over the CPD-Y/agarose column. In 
Figure 25 the peaks labelled "C" correspond to full-length peptide; the peaks labelled (in bold) 
5, 6, 7, 8 or 9 correspond to the control peptide minus 5, 6, 7, 8 or 9 amino acids, 
respectively. 

10 The results of the flow digestion using immobilized CPD-Y/agarose shown in Figures 

23-25 demonstrate the limitations of traditional matrices such as 4% agarose; specifically 
these results show the lack of control possible when using a diffusional matrix. The data 
showT. in Figures 23-25 iudicaieu thai multiple ammo acids were released from the carboxy- 
terminus of the control peptide on a single pass (see Figure 24), based on the appearance of 

15 multiple peaks. Surprisingly, it took more that 15 passes through the CPD-Y/agarose matrix 
to substantially decrease the amount of full length peptide (Figure 25B); even after 1 1 passes 
through the CPD-Y/agarose column, the full-length control peptide comprised a large 
percentage of the molecules present (Figure 25A). These results were surprising because 
CPD-Y has the highest affinity for phenylalanine (the carboxy-terminal peptide on the 

20 undigested control peptide) of all the amino acids present in the control peptide; therefore it 
was expected that the phenylalanine residue would be rapidly removed from the control 
peptide as it was applied to the CPD-Y/agarose matrix. However, it was found that the 
phenylalanine residues were not being removed from significant percentage of the molecules 
while other less preferred residues were being removed. 

25 Experiments in which the control peptide concentration (100 )ig/ml) was reduced to 50 

|ig/ml or 10 |ig/ml had little influence on the peak pattern seen when an equal amount (1 )ag) 
was analyzed by reverse phase HPLC as described above. The flow rate was also varied; the 
flow rate was slowed from gravity flow (approximately 0,5 ml/min) to 100 |ig/ml by 
attaching an adapter for silicon tubmg to the top of the Affinica column which was then 

30 attached to a peristaltic pump. Lowering the flow rate of the substrate through the Affmica 
column containing 1 ml of CPD-Y/agarose decreased the amount of intact (full-length) 
peptide seen on the resultmg chromatograph but did not change the peak pattern Control 
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determine if there were conformational structures limiting the access of the CPD-Y enzyme to 
the caiboxy-termmus of the small control peptide The urea samples were digested at slower 
rates due to the reduced efticiency of the enzyme in the urea buffer, but the same multiple 
peak pattern was observed indicating the absence of conformational limitations. Analysis of 
the chromatographs of the digestion products revealed that 9 major peaks were identified; the 
accumulation of the peak at 9.9 min. is the 3-mer Ala-Leu-Lys which would be expected after 
the complete digestion of the control peptide with CPD-Y under these reaction conditions 
because CPD-Y cannot hydroiyze dipeptides and is very slow at removing tripeptides. 

To achieve limited, specific digestion of the control peptide (i.e.. removal of the 
phenylalanine residue only) approximately 4 mg of CPD-Y (Sigma) was immobilized to a 
pre-activated (GTA) Acti-Disk Cartridge (25 mm; Arbor Technologies). The silicon-based 
matrix within the Acti-Disk is described in U.S. Patent No. 3,862,030, the disclosure of which 
is herein incorporated by reference. Essentially, this matrix is a microporous Huid-permeable 
filter that has finely divided hydrophilic filler particles dispersed throughout a microporous 
matrix which are capable of binding large amounts of protein or enzyme. The matrix has an 
extremely large surface area (80 mVgram) attributed to a large number of interconnected 
pores of non-uniform size distribution. The matrix is 60% porous and is commercially 
available in a wide variety of sizes and thicknesses for scale-up. This matrix represents a 
limited-diffusional matrix. 

Immobilization of CPD-Y to the Acti-Disk cartridge was accomplished by recirculating 
10 mis of a solution containing 1 mg/ml CPD-Y in 100 mM Na Citrate (pH 6.0) through the 
Acti-Disk according to the manufacturers protocol for 90 minutes at room temperature. The 
ODjjjo of the CPD-Y solution was measured before and after the immobilization procedure. 
The disk was washed extensively with 100 mM Tris-HCl (pH 8.0), 500 mM NaCl buffer until 
no protein was detected in the wash. The final concentration of enzyme per available fluid 
volume within the disk (65 nmoles/115 ^1) was 0.565 mM (35 mg/ml). The porous nature of 
the Acti-Disk matrix immobilizes the carboxypcptidasc enzyme on the exterior of the matrix 
allowing a more consistent exposure of the substrate to enzyme than do the traditional cross 
linked agarose matrixes. 

The control peptide (SEQ ID NO:70). at a concentration of 100 jig/ml , was passed 
through the CPD-Y Disk at flow rates ranging from 4 ml/min to 100 ^l/min; a periplasmic 
pump (Pharmacia) was used to regulate the flow A 2 ml sample of the flow dicestion nt 4 
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reaction was allowed to equilibrate. Five milliliters of the solution containing the control 
peptide processed at the lower flow rate (3 ml/min) was allowed to pass through the exit 
tubing before another 2 ml sample was collected. This procedure was repeated at each 
designated flow rate. An aliquot (25 pi) of each sample was run on the Hytach protem 
5 column as described above. The results of these digestions are summarized in Table 4 below. 
In Table 4 the time refers to time spent in the reaction chamber (i.e., the Acti-Disk). 



TABLE 4 



Flow Rate (ml/min) 


% Product 
Remaining 


Time (sec) 


Product (^tM) 


U./j 


Q 1 
6. J 


/ / .0 


1 L 


0.50 


14.0 


13.8 


68 


1.0 


25.6 


6.9 


59 


1.5 


32.4 


4.6 


54 


2.0 


38.7 


3.45 


48 


2.5 


42.8 


2.76 


45 


3.0 


46.4 


2.3 


43 


4.0 


51.5 


1.73 


39 



The flow digestion experiments using the control peptide show that CPD-Y 
immobilized to the Acti Disk matrix allows for effective control of the incubation time that 
20 the substrate is exposed to the immobilized enzyme. The control peptide was exposed to the 
enzyme only long enough to remove one amino acid. This is in contrast to the cross-linked 
agarose matrix which did not allow for such strict control of digestion times due to the 
diffusional characteristics of the immobilization matrix. 

By varying the flow rale and substrate concentration, it was possible to remove the 
25 carboxy-terminal phenylalanine residue from approximately 96% of the control peptide 
molecules with very little degradation beyond this point (see Figures 26 and 27). 

Figures 26 and 27 depict chroniatographs generated by the Shimadzu HPLC analyzer: 
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peptide. Figure 26 shows the retention times for the peptides generated using the CPD-Y Acti 
Disk and a flow rate of 250 ^1/min: Figure 27 shows the retention times for the peptides 
generated usmg the CFD-Y Acti Disk and a flow rate of 100 ^l/min. 

The data shown m Figures 26 and 27 demonstrates nearly complete processing of a 
more favorable substrate (high K J without significant processing of the less favorable 
substrate (low K^J. These c hrom at o graphs (Figs. 26 and 27) have an additional peak at 26.4 
mmutcs which has been determined to be free CPD-Y enzyme that is being released from the 
matrix. This leaking of enzyme from the matrix was elimmated when a new^ CPD-Y Acti- 
Disk was produced by the previously described protocol, followed by a 
neutralization/reduction step. 

Briefly, a 2 mg/ml solution of CPD-Y was continuously passed through a pre-activated 
GTA Acti-Disk for 90 minutes. The disk was then washed sequentially with 10 mis of water 
and 10 mis of i M NaCl. The remaining aldehyde groups were blocked and the Schiff bases 
were reduced by recirculating O.IM ethanolamine containing 50 mM sodium 
cyanoborohydride through the Acti Disk matrix for two hours at 1 ml/min. The disk was then 
washed with 30 mis of 100 mM potassium phosphate, pH 6.2. The final 2 ml of this 30 ml 
wash were collected and analyzed on a Beckman DU 7000 spectrophotometer in scanning 
wavelength mode (190 nm to 600 nm) and no peaks were seen when compared to the buffer 
only sample. 

In order to estimate the activity of the CPD-Y immobilized to the Acti Disk matrix the 
following experiments were conducted. 

a) Use Of N-CBZ-Dipeptides For The Quantitation Of 
Immobilized Enzyme Activity 

The reported values for CPD-Y have been generated using extremely low enzyme 
concentration compared to substrate concentration. The value for represents the 
concentration of substrate that will produce 1/2 the maximum velocity for the initial reaction 
rates. During the testmg ot the immobilized CPD-Y enzyme disks with various dipeptide 
substrates, the maximum flow rate that can be achieved without any substantial increase in the 
pressure of the solution, is 4 ml /min. At this flow rate, the substrate remains within the 
immobilized en2>'me matrix for 1.74 seconds. This value is determined by dividing the 
interna] porous volume of the matrix to which the enzyme is immobilized (1 1^ li1 MOOf) til 
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immobilization matrix (115 |il) a very high percentage of the substrate was hydrolyzed even 
at these short incubation times which prevents the accurate determmation of actual values 
for the individual substrates. Ihe high level of activity also limits the which N-CBZ 
dipeptides can be used to determine first order reaction rate kinetic constants to those 
5 dipeptides that do not exhibit substrate inhibition {e.g., N-CBZ-Glu-Tyr, N-CBZ-Ala-Pro, 
N-CBZ-Ala-Leu and N-CBZ-Pro-Phe). 

The following experiments were designed to characterize the enzymatic properties and 
the activity of the Acti Disk-immobilized CPD-Y All substrates and reagents were purchased 
from Sigma. 

10 Standard ninhydrin assay: Hydrolysis of the non-proline conlainmg N-CBZ-dipeptides 

was determined by assaying the amount of free amino acid released by a colorometric 
ninhydrin reaction at a pll greater than 5.2. One gram of ninhydrin was dissolved in 50 ml 
ethylene glycol monoiiicihyl ether containing 0.03% ascorbic acid. This ninhydrin solution 
was made fresh daily. A 0.4 M Citrate buffer, pH 5.2 was made by adjusting the pH of ACS 

15 grade citric acid with NaOH. Two hundred microliter samples at pH greater than 5.2 were 
mixed with an equal volume of ninhydrin reagent and incubated at 100°C for 20 minutes. 
When 200 (al samples below pH 5.2 were to be assayed, 100 |il of 0.4 M Citrate buffer was 
included prior to heating the sample to lOO^C. Samples were then cooled to room 
temperature and 600 ^1 of 60% ethanol was added as the diluent and the absorbance was read 

20 at 570 nm on a Beckman DU 7000 spectrophotometer. 

Proline ninhydrin assay: Hydrolysis of proline-containing N-CBZ-dipeptides were 
measured with a modified version of the assay described by Magne and Larher [Anal, 
Biochem., 200:115 (1982)]. Briefly a ninhydrin reagent containing 2% ninhydrin prepared in 
glacial acetic acid:water 60:40 (v/v) was prepared. Five hundred microliter samples were 

25 combined with 500 glacial acetic acid and 500 |il ninhydrin reagent was added. The 

sample was subsequently boiled for one hour. One milliliter of analytical grade toluene was 
then used to extract the chromophorc and the absorbance w^as read at 520 nm. 

Concentrations of N-CBZ-Ala-Pro ranging from 4 mM to 100 ^M (100 mM citrate, 
100 mM NaCl, pH 5.75) were passed through a CPD-Y Acti Disk at a flow rate of 2.0 

30 ml/min. Reaction rates were determined by assaying the appropriate dilutions of the various 
substrates using the above described proline ninhydrin assay specific of free proline to obtain 
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OD570 readings below 1.5. The results obtaining using the N-CBZ-Ala-Pro dipeptide ai 
summarized in Table 5 below. 



TABLE 5 



Substrate Concentration 


Velocity 


100 uM 


1 70 nM/min 


200 |iM 


344 nM/min 


300 nM 


503 nM/min 


500 


846 nM/min 


1000 


1684 nM/min 
1 



Reaction rates were directly dependent on substrate concentration (when using less 
than 1 mM substrate) which defines the flow digestion as first order reactions at these 
concentrations. 

N-CBZ-Ala-Pro (dissolved in 100 mM citrate, 100 mM NaCl, pH 5.75) at a 
concentration of 200 \xU, was passed through the same CPD-Y Acti Disk at flow rates 
ranging from 2 ml/min to 175 ^1/ min. Five milliliters of each specific flow rate were 
collected and assayed by the proline ninhydrin assay in order to determine the relative 
percentage of substrate that was hydrolyzed using the various flow rates. In order to 
characterize the amount of substrate hydrolyzed per unit of time, flow rates are converted to 
amount of time that the substrate is within the immobilized enzyme matrix (Acti Disk). The 
amount of time the substrate is within the immobilized enziyme matrix at a given flow rate is 
shown below in Table 6. 



TABLE 6 



Flow Rate 


Log S/S-P 


Seconds 


2000 ml/min 


.560 


3.45 


1000 ^l/min 


.672 


6.90 


350 ^l/min 


1.022 


19.7 
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The log of the [Substrate]/[Subslrate]-[Product] over seconds was plotted and yielded a 
line that intersected the Y axis at 0.458 and has a slope of 0.030. The slope of 0.030 is equal 
to the firsl-order rate reaction constant divided by 2.3 according to the denvatized first-order 
rale equation (k't 2.3 log [SJ/[S]-[Pj). This rate constant is determined (0.069) and has the 
5 units of seconds"*. This value is converted to minutes ' by multiplying by 60 (60 x .064 = 
3.84 min '). The fact that this line does not intersect the Y axis at 0 is an indication that the 
reaction initially has a higher rate constant in the initial stages of the reaction and levels off at 
a reduced reaction rate constant after the first couple of seconds. This variation of first order 
kinetics was not present when 500 N-CBZ-Ala-Pro in the same buffer (100 mM citrate, 
10 100 mM NaCl, pH 5.75) was repeatedly passed through the matrix at the same flow rate 

(2000 jil/min). Plotting of the log [S]/[S]-[P] versus seconds yielded a line that has a slope of 
0.199 and nearly intersects the x/y axis (Figure 28). Table 7 below summarizes the log (S/S- 
P) obtained when the N-CBZ-Ala-l'ro substrate was passed over the CPD-Y Acti Disk matrix 
1 to 3 times at a flow rate of 2000 |j.l/min, 

15 TABLE 7 



Number of Passes 


Log (S/S-P) 


Time (sec) 


1 


.614 


3.45 


2 


1.236 


6.90 


3 


1.870 


10.35 



20 

These experiments showed that the alanine-proline bond was successfully hydrolyzed 
as it flowed through the CPD-Y Acti Disk matrix. The first order rate constant determined 
for the N-CBZ-Ala-Pro substrate is applicable only as a reference for comparison to other 
N-CBZ-dipeptide first order rate constants. N-CBZ-Phe-Ala is the substrate used by Sigma to 

25 characterize the activity of CPD-Y at 100 units per milligram, 1 unit can hydrolyze 1 jimol of 
N-CBZ-Phe-Ala per minute A 20 mM solution of N-CBZ-Phe-Ala in 100 mM Citrate 
buffer, pH 5.75 containmg 100 mM NaCI was passed though a CPD-Y Acti-Disk at flow 
rates of 1, 2, 3 and 4 ml per minute. Although the N-CBZ-Phe-Ala substrate displays 
substrate inhibition, the two shortest incubation times (3 and 4 ml/min) produced consistent 

30 kinetic constants (K' min = 2.3 log [S]/[S]-[P]) which indicates that substrate inhibition had 
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of digestion of the N-CBZ-Phe-Ala substrate at the flow rate of 4 ml/minute was 41.96 
^mole/min. This value was calculated by multiplying the concentration released alanine 
(10.49 mM) by the liters of sample that were processed (.004) in one minute. 

First order rate constants were approximated for a number of N-CBZ-dipepiidcs by 
passing them through the CPD-Y Acti Disk matrix at a flow rate of 2 ml/mm. The amount of 
free amino acid released was determined by the previously described ninhydrin assays. Initial 
substrate concentrations and the concentration of free amino acid released are represented in 
Table 8 below as log S/S-P values which are used to obtain first order reaction constants at 
3.45 seconds of incubaUon (2 ml/min flow rate). All reactions were performed in the same 
buffer (100 mM Citrate buffer, 100 mM NaCi, pH 5.75 ) at room temperature (~25°C). 

TABLE 8 



Substrate 


i3/3-r 


K' min ■ 


N-CBZ-Gly-Pro 


0.004 


0.160 


N-CBZ-Pro-Phe 


0.037 


1.480 


N-CBZ-Ala-Pro 


0.560 


22.40 


N-CBZ-Glu-Tyr 


0.606 


24.24 


N-CBZ-Ala-Leu 


0.688 


27.52 



The rate constants shown in Table 8 above are applicable for comparison to each other 
only and are substantially lower than the published rate constants due to the inability to 
saturate the high enzyme concentrations present within the Acti Disk matrix. 

Results from the hydrolysis of N-CBZ-Ala-Pro in the flow digestion system confirm 
that the alanine-proline bonds is a sufficiently favorable substrate (high K,J to be hydrolysed 
within the CPD-Y Acti Disk matrix under the flow conditions described. The 
N-CBZ-dipeptides provided no information as to whether a particular substrate could be 
multiply hydrolyzed as it passed through the limited thickness of the Acti Disk matrix. In 
order to test the immobilized system for multiple hydrolysis events, a unique substrate was 
needed. N-CBZ-Gly-Pro-Leu-Ala-Pro was purchased from Sigma for this experiment because 
prohne and alanme/leucinc amino acids could be separately quantitated using the above 
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The N-CBZ-Gly-Pro-Leu-Ala-Pro substrate was suspended at a concentration of 200 
f^M in 100 mM Citrate buffer, 100 mM NaCl, pi I 5.75 and subjected to flow digestions using 
flow rates of either 2 ml/min, I ml/min or 250 |il/min. Samples were collected after one pass 
through the disk with the exception of the experiment using a flow rate of 2 ml/min in which 
5 a sample was collected after 1 or 2 passes. Five milliliter from each flow digestion rate were 
saved and each was assayed by both the standard and proline ninhydrin assays. The relative 
concentration of amino acids from the assay results are listed below in Table 9. 



TABLE 9 



Flow rate (ml/miD) 


2.0 


1.0 


0.25 


2X2 ml/miD 


Standard 


265.7 


315.9 nM 


384.0 


353.4 iiM 


Proline 


197.0 )iM 


196.5 ixM 


212.5 nM 


195.6 uM 



A number of conclusions can be inferred by the analysis of the results obtained using 
the N-CBZ-pentapeptide. Hydrolysis of the N-CBZ-dipeptides is substantially slower than 

15 when an amino acid is in the N-CBZ position in a tripeptide or quadrapeptide. This 

difference can be approximated by comparing the rate that the N-CBZ-Pro-Phe dipeptide was 
hydrolyzed to the rate that the proline-leucine bond in the tripeptide position of the 
pentapeptide was hydrolyzed. The analysis of the second flow at 2 ml/min using the 
pentapeptide provides the necessary information when a couple of assumptions are made. On 

20 the first pass of the substrate though the CPD-Y Acti-Disk matrix nearly all of the primary 
proline and alanine residues were released. This is a logical assumption because the CPD-Y 
enzyme can release amino acids sequentially one at a time, The proline ninhydrin assay 
indicates that nearly equimolar amounts of proline to substrate were released and the 
hydrolysis rate of the leucine-alanine bond can approximated from the earlier dipeptide kinetic 

25 data to he --20 times faster than the subsequent proline-leucine bond, it has been previously 
demonstrated that phenylalanine and leucine have nearly equal kinetic constants when they arc 
in the ultimate (PI ) position of N-CBZ-Gly-X dipeptides [Kuhn, Biochem. 13:3871 (1974)]. 
Assuming nearly all the amino acid released by the second flow at 2.0 ml/min through the 
matrix to be the result of the hydrolysis of the remaining Pro-Leu bond, a kinetic constant can 

30 be calculated as previously for the K-CBZ-dipeptides (134.3 jiM 46.6 ^iM = 87.7 ^iM 
product in 3.45 seconds, k ISI^O) The ratt* ctmstant nvrr ten finv*^ l^irh^^r ^nr 'h- 



10 
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N-CBZ-pentapeptide flow digestion confirms is that a second pass of the substrate through the 
CPD-Y Acti Disk matrix at the same flow rate (2 ml/min) results in more hydrolysis than 
does a single pass which allows twice the incubation time {i.e.. using 1/2 flow rate). 

The above results provide guidance for the maximization and control of the rate of 
hydrolysis through the adjustment of the primary amino acid sequence when designing the 
spacer and junction regions as described by this invention. As can be summarized from the 
inherent preferences of both CPA and CPD-Y for hydrophobic aliphatic amino acids, the ideal 
endoprotease for use in this invention would cleave at the amino side of at least three base 
specificity and prefer arginines and lysines adjacent to the amino terminal cleavage site. This 
would allow the generation of authentic proteins by a simple CPB digestion. Alternatively, a 
preferred endoprotease could cleave at the carboxy side of a specific hydrophobic sequence 
that can be efficiently removed by the combination of CPD-Y and CPA digestions, followed 
by a CPB digestion to generate the authentic molecule. Two original designed, preferred 
examples are listed below for tluombin and collagenase which have been deduced from 
15 known substrate specificities for the indicated endoprotease combined with the substrate 
specificity of the carboxypeptidases that are used to remove the residual endoprotease 
recognition sequence. 

Thrombin Phe-Leu-Aia-Pro-Arg-Gly-Thr (SEQ ID N0;71) 

20 P5 P4 p3 p2 PI pp 

fChang, Eur J. Biochem.\5]:2]l (1985)] 

Collagenase Ala-Pro-Tyr-Gly-Pro-Pro (SEQ ID NO:72) 

P3 P2 PI Pr P2' P3' 

25 [Steinbrink, Bond, and Van Wart, 1 Biol. Chem. 260: 2771 (1985)] 

b) Determination Of The Rate Of Hydrolysis For The Lys-Lys 
Pair Represented In The Preferred Hydrophilic Spacer 

A good approximation of the rate of hydrolysis of the Lys-Lys bond of the hydrophilic 
30 spacer is needed to insure that the flow rales used to generate authentic proteins do not allow 
for hydrolysis events which extend past the point of the hydrophilic spacer Since the 
dipeptide hydrolysis data did not give reasonable approximations of the hydrolvsis rates o^^ 
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The peptide sequence is Pro-Leu-Ser-Arg-Leu-Ser-Val-Ala-Lys-Lys (SEQ ID NO: 73) {Sigma; 
herein after referred to as the Lys-Lys peptide). Analysis of the of the reaction rates required 
the use of a modified ninhydrin assay developed by Doi, et al. for the analysis of peptidase 
activity [Anal. Biochem, 1 18:173 (1981)]. Compared to the ninhydrin assays described 
5 above, the pH of the ninhydrin solution is reduced (pH=4.6) and the heating time is shortened 
(less than or equal to 20 minutes) such that only free amino acids react at appreciable rates. 
Incubation times were strictly controlled by performing incubation at elevated temperatures in 
a programmable thermocycler (Hybaid). The fact that the peptide had a proline residue at its 
amino terminus significantly lowered any possible peptide background. 

10 The Lys-Lys peptide was suspended at a concentration of 400 jiM in 100 mM Citrate 

buffer, 100 mM NaCI, pH 5.75 and subjected to flow digestions at 2 ml/min and 500 ^1/min. 
The amount of free lysine released was quantified by mixing 200 ^1 of flow sample with 200 
^! 0.2M NaCitrate, pH 4.6 and 100 of 2.5% ninhydrin in methyl cellusolve (Sigma), 0.03 
% ascorbic acid. Samples, control peptide only and known standards were incubated at 98°C 

15 for 20 minutes, cooled to room temperature and 600 (il of 60% ethanol was added as the 

diluent. The OD570 was determined and the values were converted to jiM of product in order 
to determine the first order rate constant for the Lys-Lys bond in the CPD-Y Acti Disk flow 
digestion (k = 0.160). This is the same value determined for the N-CBZ-Gly-Pro dipeptide. 
This extremely low hydrolysis rate constant for the Lys-Lys peptide bond allows protein 

20 samples to be passed through the CPD-Y Acti Disk matrix twice at a flow rate of 2 ml/min 

without measurable digestion beyond the Lys-Lys pair. Samples are passed through the CPD- 
Y Acti Disk matrix twice in order to insure the complete removal of proline residues. 

The estimated for the Ser-Phe bond is 0.69 mM and the K^, is 400/min [Khun, 
Biochem. 13:3871 (1974)]; the Ser-Phc bond is the first bond to be cleaved during hydrolysis 

25 of the control peptide. Actual reaction rates obtained for CPD-Y immobilized to the 
Acti-Disk indicated that less than 20% of the enzyme activity was present after the 
immobihzation process. This value was determined by comparing the initial velocity of the 
control peptide CPD-Y-Acti-Disk How digestion to the manufacturer's specific activity for 
free enzyme {i.e., enzyme present in solution) through the use of the Michaelis-Menton rate 

30 equation. The specific activity for Phe-Ala substrate 100 fimoles/mg (Sigma) was converted 
to activity by multiplying by the amount of enzyme immobilized within the Acti-Disk matrix 
(4 mg). The catalytic constant lor the Phe-Ala substrate was calculated by dividing the 
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6200/min. This value compares well to the value that can be approximated from the K„, data 
provided by Kuhn. The K^, value for the cleavage of the Ser-Phe bond by free CPD-Y can 
be deduced from Kuhn's data to be approximately 400/min and this value was used to 
calculate the maximum initial velocity (V^ at fS]=79 pM) for 100% activity of the 
immobilized enzyme. The V^^^ for 100% activity is calculated by multiplying K^, f400/mm) 
by the enzyme concentration (0 065 jamoles/l 15 = 0.565 mM), V^^ - 226 mM/min. 
100% activity (Vo) for 79 juM flow digestion experiments is calculated using the theoretical 
Km =.69 mM deduced from Kuhn^s data, Vo - 23.2 mlWmin. Initial velocity Vo for control 
peptide flow experiments were determined by plotting |aM of product produced versus 
seconds of incubation within the immobilization matrix of the Acti-Disk (see Figure 29). 

In Figure 29, the log ([S]/[S]-[P]) (in ^M) is plotted against the incubation time (m 
seconds). For the results shown in Figure 29, the following calculations apply: the 4000 
^i/min Flow rate through 115 ^1 reaction matrix equals 1.73 seconds within the matrix. The 
initial reaction rate (VJ was approximated from the initial slope (3.6 mM/min) of a 
logarithmic curve that fit the data points (y=33.074 + 28.866'^LOG(x)). This nearest fit curve 
was necessary because the reaction proceeded to over 45% completion at the first observed 
time point (1.73 seconds). Comparison of the observed VJ3.6 mM/min) to the maximum V^ 
(23.2 mM) for 100% activity of the immobilized enzyme allowed the determination that the 
immobilized enzyme had approximately 15.5 percent of the free enzyme activity. 

The calculations made above were used to approximate the activity of the immobilized 
CPD-Y Acti-Disk. There are many reasons why the immobilized enzyme could show such a 
decrease in activity compared to the manufacturer's (Sigma) specific activity based on 
dipeptide cleavage. The observed reduced rates can result from higher K^s and lower K^^^s 
for the control peptide as compared to dipeptide resuhs. The hydrophobicity of the control 
peptide may influence its ability to bind to the enzyme, but may also interact with the 
immobilization matrix and limit accessibility to the enzyme. These effects and others have 
been described in the literature [Goldstein, Methods Enzymol 44:397 (1976) and Laidler and 
Bunting, Methods Enzymol. 64:227 (1980)]. 

The use of a high concentration of enzyme relative to the substrate allow^ed the 
hydrolysis of the carboxy-terminal phenylalanine with a single pass of the substrate past the 
immobilized enzyme because the following serine-serine bond is cleaved with less efficiency. 
Under these conditions where the second bond of the peptide is suhstantiallv It-ss preferred 
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cleavage of the second less preferred bond. An experiment where 5 ml of the control peptide 
(100 fig/ml) was recirculated through the CPD-Y Acti-Disk at a flow rate that cleaved 86% of 
the ultimate bond (Ser-Phe) in a single pass (flow rate of 500 ^l/ml) resulted in 96% cleavage 
without cleavage of the second bond (Ser-Ser). The control peptide was subjected to a single 
5 pass through the reaction disk (at a flow rate of 500 ^Uml) and was followed by 20 minutes 
of recirculation at the same flow rate [/.c, the first pass of a twenty minute recirculation 
experiment (5 mis ^ 10 passes through the matrix)]. This experiment varied from the 
recirculation experiment described above for the column recirculation (using the CPD- 
Y/agarose matrix). In this experiment (using the CPD-Y/ Acti-Disk matrix), the sample 

10 volume was 5 ml, the flow loop was 2 ml; recirculation using a peristaltic pump in the 

conformation described above results in the dilution of the 3 ml single flow continuously with 
secondary flow at the rate of 500 |al/min. After 3 minutes, the single flow has been diluted in 
half with secondary now and there is now tertiary flow entering the system. Under these 
reaction conditions, the Ser-Ser bond of the control peptides is not detectably cleaved at this 

15 flow rate. 

It is not possible to selectively remove the ultimate amino acid from all amino acid 
combinations one at a time. The digestion of the control peptide's ultimate residue 
(phenylalanine) using immobilized CPD-Y demonstrates the degree of control over the 
hydrolysis of favorable amino acids (low K^^ value) over unfavorable amino acids (high K,^ 

20 value). This control allows for the removal of the remaining cndoprotease recognition 

sequence without the digestion of the hydrophilic spacer. It has been reported that the activity 
and specificity of carboxypeptidase Y for various substrates under modified conditions {i.e., 
pH 4.3 vs. pH 7.0) enhances or retards relative reaction rates for particular amino acids 
[Breddam and Ottesen, Carlsberg Res. Commun. 52,55-63 (1987)]. 

25 Figure 30 provides a summary of the relative rates of release for carboxy-terminal 

amino acids from various di-peptides; these rates were deduced from reviews of CPD-Y 
digestions of various substrates and previously described PI and PT substrate preferences 
[Breddam and Ottesen, supra: Breddam, Carlsberg Res. Commun. 51:838 (1986); Martm, et 
al, Carlsberg Res. Commun. 42:99 (1977): Klarskov, et al. Anal Biochem. 180:28 (1989); 

30 Kuhn, et al. Biochem. 13(19):3871 (1974), Hayashi, ei a/., J. Biol. Chem. 248(7):2296 
(1973); Hayashi, Methods Enzvmoi 47:84 (1977)]. The values listed in Figure 30 arc 
estimates of the K^^/K^^fmM '.min' ) for peptides normal digestion conditions based on the 
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previously described experiments. These values are used to standardize the relative activities 
of CPD-Y-Acti-Disks. The sequences listed in bold type at the bottom of Figure 30 have the 
lowest values and represent those sequences that are present withtn the described hydrophilic 
spacers. The values listed are for N-CBZ-dipeptides and are not directly applicable to 
reaction rates that apply to the hydrolysis of proteins or polypeptides (larger than a dipeptide). 
Hach protein substrate to be digested by the immobilized CPD-Y disk will be tested 
independently for the relative rate of hydrolysis of its carboxy-terminal amino acids using a 
CPD-Y-Acti-Disk which has been previously standardized. 

Each prepared CPD-Y Acti-Disk mast be assayed to determine its maximum activity 
Control peptides or dipeptides having substantially similar K„/K^ values to the amino acid 
pair which is to be removed (Figure 30) are used to determine relative flow rates that will be 
used to selectively remove greater than 90% of the ultimate amino acids from a uniform 
population of desired protein substrate in a single pass through a particular CPD-Y Acti-Disk 
The substrate used in the control digestion model (Ser-Phe; bolded and underlined in 
Figure 30) is not the most nor the least preferred substrate of the enzyme (refer to Figure 30). 
Carboxypeptidase Y has a preference for particular ammo acids in the ultimate and 
penultimate positions. Relative rates of digestion for particular pairs of amino acids can range 
several fold based on specific affinity (K^) and the pairs influence on hydrolysis rates (K^:^^). 
The flow digestion example demonstrates the selective removal of a preferred amino acid 
when the remaining amino acid is less preferred without any significant product inhibition. In 
cases were the leaving amino acid of the substrate is less preferred than the carboxy-terminal 
amino acid of the product there will be a reduction in the rate of cleavage of the original 
substrate due to competition from the product. The product competition in the first flow of a 
CPD-Y digestion will be very limited due to lack of product entering the reaction chamber. 
On the second pass of the target molecule through the reaction chamber the competition will 
be significant because of the high ratio of product (PI) to original substrate. The competing 
reaction will be completely processed to another product (P2) which may or may not 
significantly compete for the immobiiized enzyme In either case, the sample is passed 
through the reaction matrix again. When the P2 substrate is less favorable (hydrophilic 
spacer) compared to the original substrate, the original substrate is substantially processed. 
Wlien the P2 substrate is more favorable, it is processed into P3 and subjected to another pass 
through the reaction chamber until a less favorable bond is reached fFifure 3(n 
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The competition from products in the CPD-Y flow digestion model is of minimal 
consequence because the amino acids comprising the hydrophilic spacers of the present 
mvention are much less preferred and cannot be digested at Ihc tlow rates used to remove 
proline residues of endoprotease recognition sequences. In the cases were the penuhimate 
ammo acid pair is more preferred, the substrate is passed through the immobilized CPD-Y 
Acti-Disk multiple times to insure the complete removal of the ultimate amino acid. Multiple 
Acti-Disk matrixes (0.5 mm thickness) can be stacked in a single reaction chamber to provide 
longer incubation times {see U.S. Patent 4,169,014) for the slower amino acid pairs of Figure 
30, vaJues below 100 mM/min. 



EXAMPLE 6 

Expression Of Fusion Proteins Derived From The NGF/BDNF Family Of Proteins 

This example describes the expression of fusion proteins comprising the NGF/BDNF 
family of neurotrophic proteins to further illustrate the use of the hydrophilic spacers of the 
present mvention and to highlight the factors which are considered when selecting a spacer 
design for the expression of a desired protein. 

Neurotrophic factors arc proteins which fiinction to promote the survival and 
maintenance of the phenotypic differentiation of nerve and/or glial cells. Two neurotrophic 
factors have been described that are closely related in amino acid sequence but which affect 
different, although partially overlapping, sets of responsive neurons. These two neurotropic 
factors are: ]) nerve growth factor (NGF) and 2) brain-derived neurotrophic factor (BDNF). 

NGF is a neurotrophic factor for cholinergic neurons in the basal forebrain. BDNF is 
a neurotrophic factor for sensory neurons in the peripheral nervous system. BDNF has been 
proposed to be useful for the treatment of the loss of sensation associated with damage to 
sensory nerve cells that occurs in various peripheral neuropathies [U.S. Patent No. 5,235,043 
to Collins et al, the disclosure of which is herein incorporated by reference]. 

1 he gene encoding NGF has been isolated from humans and various animals, 
includmg mice; the gene encoding BDNF has been isolated from pigs and humans. There is 
significant similarity in amino acid sequences between mature NGFs and mature BDNF, 
including the relative position of all six cysteine residues, which is identical in mature NGFs 
emd BDNF from all species examined. This suggests that the three-dimensional structure of 
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NGF and BDNF are neurotrophic factors for different, although partially overlapping, sets of 
responsive neurons. 

Based on the above characteristics, it has been proposed that NGF and BDNF define a 
family of structurally related neurotrophic proteins Additional members of this family have 
been isolated and include NGF-2 and NGF-3. 

Both NGF and BDNF are synthesized as larger precursor forms (termed preproNGF 
and preproBDNF) which are then processed by proteolytic cleavages, to produce the mature 
neurotrophic factor. These prepro regions are located at the amino terminus of the precursor 
molecule and are needed for proper folding and secretion of these proteins. The mature forms 
of NGF and BDNF have arginine residues at their carboxy termini which requires that a 
leucine residue be inserted between the naturally occurring arginine and the hydrophilic 
spacer. This leucine residue is called a CPB terminator because it prevents CPB from 
removing authentic amino acids from the natural protein; the CPB terminator can be removed 
with CPA to generate authentic molecules. 

The precursor preproNGF molecule is also proteolytically modified at its carboxy 
terminus to generate the mature arginine-terminating NGF molecule. The human gene 
sequence for the carboxy terminus of the precursor NGF molecule is shown below to code for 
an extra arginine and alanine residues. These two amino acids are removed to generate 
mature NGF by the dibasic proteolytic activity of the gamma NGF subunit. 

Coding Region 

TGT GTG TGT GTG CTC AGC AGG AAG GCT GTG AGA AGA GCC TGA 

Cys Val Cys Val Leu Ser Arg Lys Ala Val Arg Arg Ala Stop 

Mature Carboxy-Terrainus Of The Human NGF Protein 

Cys Val Cys Val Leu Ser Arg Lys Ala Val Arg 

Both NGF and BDNF require proteolytic processing and formation of the correct 
intramolecular disulfide bonds to produce mature fully-biologicaily-active or mature forms of 
these proteins. Previous attempts to produce these molecules in bacterial hosts required the 
expression of truncated mature NGF sequences in bacteria (/ c , sequences which lack the pro 
regions) and further required inefficient in vitro refoldinp steps to eencTate active molen^ir- 
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eucaryolic cells such as mammalian cells permits the proper proteolytic processing of NGF 
molecules encoded by the pre-proprotein forms of the gene; however, the expression of the 
full length preproNGF protein in mammalian systems produces low yields of active secreted 
mature NGF and the use of mammalian cells for the production of proteins is costly 
5 [Edwards, et.ai, Mol. Cell. Biol. 8:2456 (1988)]. Therefore, it is desirable to produce 

members of the KGF/BDNF family of proteins in inexpensive host cells such as bacteria 
The following example provides methods for the production of human NGF in bacterial host 
cells without the need to use inefficient in vitro refolding procedures to generate biologically- 
active {i.e., correctly processed and folded) proteins. 

10 Figures 31 and 32 provide the nucleic acid and amino acid sequences of human 

preproNGF and preproBDNF, respectively. The nucleic acid sequence and amino acid 
sequence of preproNGF are listed in SEQ ID NOS:74 and 75, respectively. The nucleic acid 
sequence and amino acid sequence of preproBDNF are listed in SEQ ID NOS:76 and 77, 
respectively. The sequence of the mature form of NGF and BDNF is indicated by the use of 

15 the large box which encloses the nucleic and amino acid sequences in each figure. In Figures 
31 and 32, underlining is used to indicated sequences which correspond to sequences present 
in oligonucleotide primers can be used to generate a DNA sequence encoding the preproNGF 
and preproBDNF, respectively. In Figures 31 and 32, amino acids present in the mature form 
of NGF and BDNF are labeled with positive numbers; negative numbers indicate amino acid 

20 residues which are removed during proteolytic processing to generate the mature form of NGF 
and BDNF. Boxes which enclose amino acid residues only indicate sites susceptible to 
cleavage by dibasic proteases and/or furin. 

NGF and BDNF are examples of the preproproteins which have a furin recognition 
motif at the proprotein/active protein junction. The pro regions of these molecules allow for 

25 efficient secretion and proper folding of these molecules into active forms. The production of 
preproproteins as properly folded biologically active molecules with removable carboxy- 
terminal affinity tails would be a considerable improvement over other recombinant 
production methods devised for these molecules 

As shown in Figures 31 and 32, these two members of the NGF/BDNF family of 

30 proteins contain hydrophilic arginine residues at their carboxy-termini making it necessary to 
adjust the composition of the hydrophilic spacer designs when expressing these proteins as 
fusions with carboxy-terminal affinity domains (cr^ the hinge and F^. portion of IgG). These 
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prevent aberrant cleavages by endogenous proteases present in the production host {i.e.. furin 
in mammalian cells). 

The nucleotide and amino acid sequences correspondmg to the preproprotein forms of 
human NGF and BDNF are shown in Figures 31 and 32, respectively. The amino acid 
sequence of the mature forms of these molecules are outlined by a thin Imed box m Figures 
31 and 32. The bold boxes represent proteol>^ic sites within the mature protein that are 
susceptible to the Kex2 protease used m the dibasic cleavage protocol. The bold boxes 
immediately precedmg the mature protein sequence represent the recognition sites for the 
protease that naturally cleaves at junction of the mature protein and the pro region to generate 
the active molecules. The first 18 amino acids in Figures 31 and 32 represent the secretion 
signal sequence that is removed during secretion. 

The expression of NGF and BDNF are described below and represent two exemplary 
methods of production for preproproteins using the methods of the present invention. Two 
different production methods are described because, although these molecules are very closely 
related, their amino acid sequences differ greatly. NGF is produced as periplasmic proprotein 
fusion in E coli because the mature form of this molecule is not susceptible to dibasic 
processing, allowing for the in vitro removal of the amino-terminal pro region. Mature 
BDNF has four internal sites that may be susceptible to the Kex2 dibasic cleavage protocol 
(/.e., the Lys-Arg and Arg-Arg dipepiide sequences indicated by the small boxes in Figure 
32), therefore an alternate production strategy is employed. BDNF is produced in a 
mammalian cell line that naturally produces high levels of furin {i.e., NIH 3T3 or COS-7) 
resulting in die secretion of a fusion protein comprising the mature form of BDNF linked to 
the carboxy-terminal affmity tail. This allows for improved yields of purified authentic 
molecules due to the efficient affinity isolation of fusion molecules from the growth media. 

The placement of a leucine residue following the carboxy-terminal arginine residues 
present in the NGF and BDNF proteins prevents CPB from removing the natural arginine. 
This hydrophobic aliphatic residue (Leu) would also prevent any processing by furin if the 
carboxy-terminus contained such a recognition motif (Arg-X-Arg/I.ys-Arg SEQ ID NOS:14 & 
15). The carboxy-terminal 1 1 amino acids of the human NGF and BDNF proteins are shown 
below using the one letter symbol for the amino acids. Sequences showTi in bold type are 
residues encoded by the hydrophilic linker which encodes the hydrophilic spacer which loms 
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the protein of interest to the affinity domain (the Kpn\INhe\ IgG fragment) via sequences 
encoding an endoprotease site. 



NGF: CVCVLSRKAVRLKRR -KpnlllgG 

5 

BDNF: CVCTLTIKRGRLKKK -endoprotease-Ap^I/lgO 

The sequence Leu-Lys-Arg-Arg (SEQ ID N0:7R) represents the preferred linker when 
1) the desired protein has an arginine amino acid at its natural carboxy terminus, 2) the 

10 mature protein is not susceptible to the dibasic cleavage protocol and 3) the desired host is a 
strain E. coli deficient in proteolysis (i.e., AGl). The hydrophilic spacer (Lys-Arg-Arg ;SEQ 
ID NO:79) within the preferred hnker contains two endoprotease sites susceptible to the Kex2 
proieasc. The sequence Leu-Lys-Lys-Lys (ShQ ID NO:80) represents a preferred linker when 
the protein of interest ends with arginine and is going to be expressed in host that expresses 

15 furin or furin-hke proteases. This hnker contains a leucine residue and the hydrophilic spacer 
Lys-Lys-Lys (SEQ ID NO: 19), both of which can be removed by CPA digestion. Authentic 
forms of mature NGF and BDNF are generated from the above-described fusion proteins by 
digestion with an endoprotease followed by digestion with one or more carboxypeptidases. 
The leucine residue (L) following the carboxy-terminal arginine (R) is removed from the 

20 protein of interest with a final carboxypeptidase A digestion (described in detail below). 

The carboxy terminal affinity domains used herein are particularly useful in the 
isolation of properly folded prepromolecules because in vivo or in vitro proteolytic processing 
of the amino terminal pro regions can occur without losing the ability to isolate the mature 
product by affinity resin chromatography. 

25 

a) Production Of Mature Active NGF From An E. coli Source 
Without Refolding 

i) Construction Of pTV-TH-NGF 

DNA sequences encoding the proNGF protein [i.e., amino acid residues -104 to 108. 
30 see Figure 31 ) is inserted into the pTVklgG-1 expression vector (described in Example 4a) to 
produce a fusion protein containing a carboxy-terminal IgG fragment that is secreted into the 
periplasmic space where proper foldmg and disulfide bond formation may occur. The 
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The fusion protem encoded by pTV- TH-NGF comprises (from amino to carboxy- 
terminus) the pho signal sequence, the proNGF protein sequence, a CPB terminator (Leu), a 
hydrophilic spacer compnsmg the sequence Lys-Arg-Arg (SEQ ID NO:79), and the hinge and 
Fc domains of human IgGl The hydrophilic spacer in this situation is also the designed 
endoprotease site(s) for the Kex2 protease. The resulting fusion protem is directed to the 
periplasmic space due to the presence of the pho signal sequence; the pho signal sequence is 
cleaved from the fusion protein during transport to the periplasm. Transport to the 
periplasmic space allows for the proper folding and disulfide bond formation within NGF 
sequences (without the need to use in vUro refolding procedures). The fusion protein is 
recovered from the periplasmic space and affinity purified on a Protein A resin. NGF-Leu- 
Lys-Arg is released from the Protein A resign and separated from its pro region by 
recirculating a commercially available Lys-Arg and Arg-Arg specific protease {i.e., the Kex2 
dibasic protease from yeast which is available from Mo Bi Tec, Gottingen, Germany) through 
the Protein A resin. The pro region of the proNGF protein sequences (re., amino acid 
residues -104 to -1, see Figure 31) contains a furin processing site Arg-Ser-Lys-Arg (SEQ ID 
NO:39) that will be correctly cleaved at the carboxy terminal side of arginine (-1) by the 
Kex2 protease. Heterologous sequences present on the NGF protein contributed by the CPB 
terminator and hydrophilic spacer are released from the NGF protem by digestion with 
immobilized carboxypeptidase B and A to produce authentic NGF. 

pTV-TH-NGF is constructed as follows. A DNA sequence encoding the proNGF 
protein is isolated using the PGR. A developing human brain cDNA library (Clontech) is 
used as the template in the PGR. Oligonucleotide primers which bracket the sequences 
encoding the proNGF protein are synthesized. Figure 31 shows the full length preproNGF 
protein (SEQ ID NO:75); sequences complementary^ to the oligonucleotide primers which are 
used to amplify the proNGF gene are underlined in Figure 31. 

Alternatively, RNA from a human source of Schwaim cells knov\Ti to contain the NGF 
mRNA can be used to generate first strand cDNA as described in Example 3; this single 
stranded cDNA preparation is then used as the template in a PGR to permit isolation of 
sequences encoding the proNGF protein. 

The commercially available human brain phage Library (Clontech) is amplified by 
plating the phage at a confluent density on a lawn of lysogenic bacteria such as Y1090 
(ATCC No. 37197). The amplified phage arc collected from the 150 mm plates hv adding 4 
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hour. The lysates from 10 plates are collected and pooled, DNA is isolated from 5.0 mis of 
the combined amplified lysate as follows. Briefly, starting with 5 mis of phage library liquid 
iysate, 50 fig DNase and 250 |ig RNase are added and the mixture is incubated for 1 hour at 
37°C. The mixture is then centrifuged for 1 5 hours at 132,000 x ^ at 4°C to collect the 
5 phage particles (In addition, PEG may be added to precipitate the phage particles prior to 

centrifugation using standard techniques). The phage pellet is resuspended in 200 fil 50 mM 
Tris-HCl, pPI 8.0 and transferred to a 1.5 ml microcentrifuge tube; 200 ^1 of buffered phenol, 
pH 8.0, is added and the mixture is vortexed for 20 minutes. The mixture is then centrifuged 
for 2 minutes at 13,000 x g in a microcentrifuge and the aqueous layer is removed. Phenol 

10 extractions are repeated until the white precipitate is removed. Chloroform (200 jil) is added 
and the mixture shaken well and then centrifuged briefly. The DNA is precipitated by the 
addition of 20 ^1 of 3M sodium acetate, pH 4.8 and 2 volumes of 100% cthanol at room 
teuipcrature and then the mixture is centntuged for 10 minutes at 13,000 x g. The DNA 
pellet is washed with 70% ethanol and then resuspended in 100 ^1 TE, pH 8.0. The isolated 

15 phage library DNA is digested overnight with 50 units of HindlU (NEB) to decrease the 
viscosity of the phage DNA preparation prior to PGR amplification (HindlU is used to 
decrease the viscosity of the isolated phage DNA preparation because there are no HindlU 
sites in the NGF cDNA). 

Nucleic acid sequences (e.g., cDNA) encoding the proprotein form of NGF are isolated 

20 using the PGR as follows (it is noted that it is not necessary to isolate the DNA prior to use 
in the PGR as described below; a phage lysate may also be employed). A five microliter 
aliquot of //mc/III-digested phage library DNA or first strand cDNA (prepared as described in 
Example 3) are amphfied in a final reaction volume of 100 p.! containing 10 )il lOx Pfu 
amplification buffer (Slratagene), 0.5 |iM each pnmer [Ngfl (SEQ ID NO:75) and NgO 

25 (SEQ ID NO:76), 200 ^M of each of the four dNTPs and 1 unit of Pfu polymerase 

(Stratagene). The reaction mixture is heated to 94''C in a thermal cycler (Perkin-Hlmer) for 4 
minutes to completely denature the target DNA and subsequently cycled 30 times (94°C for 
90 seconds, 50°C for 90 seconds and 72''C for 2.5 minutes). Two microliters of the PGR 
products are run on a 2% agarose gel tn analyze the amplified product. The PGR products 

30 may be digested with restriction enzymes; restriction digestion of the desired proNGF PGR 
products (which are approximately 660 bp in length) with EcoRl will produce t^^^o 
approximately 330 bp fragments that will appear as a doublet on the agarose gel. 



10 
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Amplified proNGF DNA fragments are purified by electrophoresing the amplified 
reaction products on a 1.5% LMA TAF agarose gel. The approximately 660 bp DNA 
fragment is cut from the gel and digested with Gelase following the manufacturers protocol 
(Epicentre Technologies). The 5' end of the Ngfl oligonucleotide (SEQ ID N0:81) primes 
the NGF gene at the beginning of the pro region (Glu at position -104; see Figure 31) and 
because Pfu polymerase has 3^-5^ cxonuclease activity, it produces a blunt end product that is 
ready for ligation to the vector (as described below the pTVkIg-1 vector is digested with 
HindlW and the ends are made blunt by treatment with the Klenow fragment). The Ngf2 
oligonucleotide (SEQ ID NO:76) alters the nucleotide sequence at the carboxy-terminal end of 
the protein to create an N^om restriction site near the 3^ end of the NGF gene; this alteration 
changes the native {ie,, naturally occurring) sequence of AGGA at nucleotides 703 to 706 in 
SEQ ID NO:74 to CGGC. This change does not alter the amino acid sequence of the NGF 
protein in the final construction (see below) but adds a restriction site which aids in the 
cloning of the desired synthetic linker encoding a hydrophilic spacer and endoprotease site. 

NGF-TTTATCCGGATAGATACGGCCTGTGTGTGTGTGCTCAGCAGGAAGGCTGTGAGA 
3--AAATAGGCCTATCTATGCCGGACACACACACACGAGTCG GCCGCCC -5- (SEQ ID NO:82) 

The purified PGR product is digested with NgoMl (NEB) according to manufacturers 
20 protocol, phenol extracted, precipitated with 2.5 volumes of ethanol and resuspended m 20 
mM Tris-HCl, pH 8.0 to generate a compatible end for the ligation of a synthetic 
linker/adapter formed by annealing together the NGOKPl (SEQ ID NO:83) and NG0KP2 
(SEQ ID NO:84) oligonucleotides; the annealing is conducted as described in below. The 
annealed oligonucleotides form the following double- stranded sequence which has a single- 
25 stranded extension at the 5' end which is compatible with NgoMl ends and a single-stranded 
extension at the 3' end which is compatible with Kpn\ ends: 

5'-CCCiGAAGGCTGTGAGACTTAAGCGGCGGGGrAC-3' NGOJCPl (SEQ ID NO:83) 
3'- n CCGACACTCTGAATTCGCCGCCC-5'NGOKP2 (StQ ID NO:84) 



15 



30 



The NGOKPl and NG0KP2 oligonucleotides are annealed together at a concentration 
of 1 (each) in 50 ^1 TF (pi! 8.0). 50 niM NaCl by heating to 85°C and slow cooling to 



wo 97/28272 PCT/US97/01470 
expression vector. The ligation of the synthetic linker/adapter to the NgoMI ends on the 
proNGF PGR product regenerates the original amino acid sequence at the carboxy-terminus of 
the NGF protein. The linker/adapter also truncates the natural dipeptide (Arg-Ala at position 
109-110 in Figure 31) that is not present on the mature product. 
3 The pTVklgG-I vector is prepared by digesting 5 pg of the vector DNA with 25 units 

oi Hindlll in a 50 ^il vol ume for 90 minutes at 37*^0. Ihe Hindlll ends are then Filled in by 
adding 2.5 |il of 0.5 mM each dNTP and 5 units of the Klenow fragment and incubating the 
mixture for 15 minutes at 30°C. The reaction is stopped by heating the mixture for 10 min at 
75°C. The buffer is then changed by passing the reaction mixture through a CHRON4A SPIN 

10 1000 column (Clontech) according to the manufacturer's directions. To the flow through, 5 
|il of Kpnl buffer and 25 units of Kpnl is added. The mixture is incubated for 90 min at 
37°C, The reaction mixture is then extracted with phenol, precipitated with cthanol and 
resuspended in 40 |al of 20 inM Tris-KCl, pH 7.5. 

The prepared insert (biunt-proNGF/linker/adapter-A/7wI) is mixed with the prepared 

15 pTVkJgG-1 vector at a 3:1 (insert:vector) ratio in a 20 |il volume comprising IX T4 ligase 
buffer (NEB), 50 ^iM ATP. T4 DNA ligase (200 units) is then added and the reaction is 
incubated for two hours at 16°C. The ligation products are then used to transform competent 
AGl cells (Stratagene). The transformed bacteria are plated on LB plates containing 
ampicillin; individual ampicillin-resistant colonies are picked into 1 ml of LB/ Amp medium, 

20 and grown overnight at 37''C in a shaker incubator. Plasmid DNA is isolated using standard 
techniques and digested with Ncol and Smal to identify clones with a single insert in the 
proper orientation. Positive clones are identified by the release of a single insert of 0.5 kb. 
Isolated positive colonies are screened for IgG production as described in Example la. 
Colonies containing plasmids having the desired insert (by restriction analysis) and w^hich 

25 produce a high titer of IgG are sequenced to confirm that the inserted DNA encode the 
desired proNGF fusion protein. 

Figure 33 provides a schematic map of the pTV-TH-NGF vector. The location of the 
trc promoter, the pho signal sequences, the proNGF sequences, the junction region, the IgG 
fragment, the ampicillin-resistance gene and the lac repressor (lacl^) gene are indicated. The 

30 direction of transcription is indicated by the use of arrows inside the circle. 

Figure 34 shows the nucleotide and amino acid sequences present at the junction 
region in pTV-TH-NGF Sequences present at the carboxy-terminal end of the NGF protem. 
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the IgG fragment (the affinity domain) are indicated. As shoNvn in Figure 34, a leucine amino 
acid separates the hydrophiiic spacer and tlie arginine residue which is present at the carboxy- 
terminus of NGF. fhis hydrophiiic spacer separates the authentic carboxy-terminus from the 
Kpnl-lgG Fc fragment. The carboxy-terminai sides of the arginine residues within the 
hydrophiiic linker are both substrates for Kex2 ( Lys-Arg. Arg-Arg) while the leucine residue 
provides a barrier to CPB digestion in order to generate authentic NGF with a final CPA 
digest. 



ii) Large Scale Production Of NGF Fusion Protein In Bacteria 

One liter cultures of bacteria containing a vector encoding a NGF fusion protein are 
started by inoculating 100 mis of LB/Amp 100 (250 ml flask) with a single NGF positive 
colony from a fresh plating or from cells stored in glycerol (prepared from mid-log growths 
inocuiaied from single colonies). The inoculates are incubated at 37°C with good aeration 
(250 rpm) until mid log phase is reached, One liter of LB/Amp 100 (in a 2.8 liter flask) is 
then inoculated with the starter culture and grown at STT with good aeration to an OD,oo of 
.600; IPTG is then added to 1 mM final concentration and growth is continued for 2.5 to 3.0 
hours. 

The induced cells are then harvested and the periplasmic protein fraction is isolated by 
the cold osmotic shock method (Riggs, supra) as follows. Cells are harvested by 
centrifugation for 10 min at 7500 x g [7000 rpm in JA-14 rotor (Beckman) at 4''C. The cell 
pellet is resuspended in 400 ml of 30 mM Tris-HCI (pH 8.0), 20% sucrose. Eight-tenths of a 
milliliter of 0.5 M EDTA (pH 8.0) is then added and the mixture is incubated for 5 to 10 
minutes at room temperature with shaking. The mixture is then centrifuged for 10 min at 
10,000 X g at 4°C and the pellet is resuspended in 400 ml ice-cold 5 mM magnesium sulfate. 
The mixture is shaken or stirred for 1 0 minutes in an ice bath. The mixmre is then 
centrifuged at 10.000 x ^ for 10 min al 4°C and the supernatant is recovered. 

Twenty milliliters of 1 M Tris-HCI (pH 8.0) and 37.8 ml of 5 M NaCl is added to the 
supernatant. 1 he mixture is then prepared for chromatography on a protein A affinity column 
by passing the mixture through a .45 micron filter to remove large particulate matter. The 
protein concentration of the filtered mixture is measured using a Coomassie assay kit (Sigma) 
and the approximately 450 ml sample is passed through a 1.0 cm Protein A column as 
described in Example 2. The- column contams 0.1 ml of immobilized Protein A per milligram 
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procedure described in Example 2). I he cold osmotic shock fluid is kept at 4**C prior to 
flowmg through the protein A column, After passing the sample through the Protein A 
matrix, the column is washed with 20 mis bindmg buffer (20 mM Tris-Cl, pH 8.0, 450 mM 
NaCi, 5 mM EDTA) and then with 20 mis thrombin cleavage buffer (25 mM Tris-Cl, pH 8.0, 
150 mM NaCK 2.5 mM CaCU. 

Two column volumes of thrombin cleavage buffer containing 50 \iglvs\\ of the Kex2 
endoprotease (MoBiTec) is recirculated through the Protein A matrix at a flow rate of 500 
|i]/min for 90 minutes at 24°C. The recirculation fluid is collected at the end of the 
recircularization using a recircularization loop. The Protein A matrix is washed with two 
column volumes of thrombin buffer and the wash is combined with the recircularization digest 
and incubated at 30°C for 30 minutes to any remaining pro junctions to be cleaved. The 
fractions are pooled and the amount of released NGF protein and the purity of the preparation 
is determined by clcctrophoresing 10 yi\ ui the sample on a 12.5 % non-reducing SDS-PAGH 
gel followed by staining with Coomaisse blue. Standards comprising NGF (Sigma) and 
molecular weight markers are included to indicate proper processing. The isolated protein is 
stored on ice or at -20^C prior to removal of non-NGF amino acids from the carboxy- 
terminus by digestion with carboxypeptidase. 

The carboxy-termmus of the isolated and Kex2-digested NGF protein has the 
following amino acids residues which are contributed from the CPB terminator and the 
hydrophilic spacer which must be removed in order to generate the authentic form of mature 
NGF: 



(MATURE NGF)-Leu-Lys-Arg 



These non-NGF protein amino acids are removed using preparations comprising 
immobilized carboxypeptidases. The source and purity of the carboxy peptidase enzymes used 
should be of highest quality available, preferably prepared chromatographically (available 
from Sigma). The buffer is changed by passing the sample through a Sephadex G-50 column 
(Pharmacia) as follows. The column is prepared using a volume of Sephadex G-50 equal to 4 
times the protein sample volume and the Sephadex G-50 column is equilibrated with 100 mM 
Citrate/NaCl buffer, pH 5.75 (100 mM citnc acid , 150 mM NaCl, pH adjusted with NaOH). 
The protein sample containing released NGF, the Kex2 protease and fragments of the digested 
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maximum flow rate without increased pressure. Sample fractions are collected as the protem 
components elute by size: Kex2 . released NGF, digested pro region fragments. Fractions 
containing the second eiution peak (measured by absorbance at 280 nm) are pooled as 
released NGF. 

The Ke.x2-digested NGF protein isolated above is concentrated to a fmal concentration 
of 2 mg/ml (lusing a Centriprep-lO (Amicon)l and the solution is adjusted to pH 8.1 with 
NaOH. One hundred microliters of CPB-Sepharose (prepared as described below) is added 
per milliliter of protein solution (at 2 mg/ml, pH 8.1) and the mixture is incubated for 2 hours 
at 25 °C with end over end rotation. 

CPB-Sepharose is prepared as described [Sassenfeld and Brewer, Bio/Technol. 2:76 
(1984) and U.S. Patent No. 4,532,207 to Brewer et aL the disclosure of which is herein 
incorporated by reference]. Briefly, 20 mg of carboxypeptidase B-DFP (re., diisopropyl 
fiuorophosphate treated) Type I (Sigma) in 10 ml of O.IM Na^HCO, (pH 8.3) was added to 
ml of CNBr-Sepharose (Pharmacia), fhe mixture was incubated for 16 hours at 4''C. The 
CPB-Sepharose is stored in PBS containing 0.1% azide at 4"'C. 

The above-described procedure (exposure of the Kex2-digested NGF protein to CPB- 
Sepharose) efficiently removes only the carboxy-terminal arginine and lysine. In preparation 
for CPA digestion, the pH of the sample is adjusted to 8.5 with NaOH after adding 1/10 
volume IM ammonia carbonate, pH 8.5. Ten units of immobilized CPA (Sigma) is added to 
the sample for every lamol of substrate present. The reaction is incubated for 3 hours at room 
temperature (25°C) with end over end rotation to insure adequate mixing of substrate with the 
immobilized matrix. The immobilized CPA is removed by filtration. This reaction can be 
monitored by the analysis of 200 ^1 fractions by the ninhydrin reaction for released free 
amino groups as described above (Doi, et a!., supra). The reaction is complete when a molar 
equivalent of leucine residues are released to generate authentic NGF. Additional 
chromatography steps (i.e.. ion exchange, gel filtration. RP-HPLC and/or FPLC) may be 
employed to gain even higher purity of the recombinant NGF. 

Gel filtration on a Sephadex G-25 equilibrated 0.01 M sodium phosphate, pH 7.0, 0.1 
M NaCI is used to .separate the NGF molecule from the released amino acids of the 
carboxypeptidase digestions and prepare the sample for a final ion exchange chromatography 
step to separate any unprocessed protein. 
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b) Production Of Mature BDNF In Mammalian Cells 

i) Construction Of An Expression Vector 
Encoding A BDNF/lgG Fusion Protein 

As shown in Figure 32, the mature form of human BDNF ends with a carhoxy- 
5 terminal arginine residue and the carboxy-terminal amino acids contain only a portion of the 
furin moiii (e.g., Arg-Gly-Arg). Like other proteins in this family, BDNF contains 
hydrophilic amino acids at its carboxy-terminus therefore additional consideration in the 
design of the hydrophilic spacer is needed. Because of the presence of internal dibasic 
(Lys-Arg) sites within the mature BDNF molecules (see small boxes shown in Figure 32), it 

10 is not a candidate for the in vitro removal of the pro region from the fusion protein as was 

described above for KGF. Instead, the preproBDNF protein is expressed as a fusion with the 
IgG fragment; the BDNF and IgG domains are joined via a hydrophilic spacer and sequences 
which provide a fccugniiiun site for the endoproiease renin. The expression vector encodmg 
the BDNF fusion protein is expressed in mammalian cells which produce high levels of furin 

15 (^ g , kidney and liver cell lines). This endogenous furin is used to remove the pro region 

form the BDNF fusion protein in vivo\ the secreted fusion protein comprises the mature form 
of BDNF joined to the IgG affinity domain. The affinity domain is removed from the BDNF 
protein by digestion with renin and authentic BDNF is then generated by treatment of the 
renin-digested BDNF with carboxypeptidases. 

20 

ii) Production Of BDNF In Mammalian Host Cells 
With In V^ivo Processing Of The Pro Region 

The human brain cDNA library (Clonetech) used to amplify the gene sequences 
encodmg NGF can also be used to amplify the gene sequence for BDNF as described 

25 above. Sequences encoding the full length gene for the preproBDNF protein are isolated from 
this library using PGR amplification Two primers which are complementary to the 5' and 3' 
ends of the coding region were synthesi7ed (National Biosciences). The 5' primer [BDNF-5 
(SRQ ID NO:85)] begins exactly at the ATG start codon for the BDNF gene (see underlining 
at the 5^ end of the BDNF gene shown in Figure 32). The 3^ primer fBDNF-3 (SEQ ID 

30 NO:86)] hybridizes to the first strand cDNA nine bases 5' of the stop codon (see underlining 
at 3' end of the BDNF gene shown in Figure 32). 

As shown below, the degeneracy of tlic codons allowed the creation of an Mlu\ 



wo 97/28271 PCT/US97/01470 
site allows for the cloning of the modified linker required because the mature BDNF protein 
has an arginine at its carboxy-tcrminus. 

Native Sequence: 

ARG-ILE-ASP-THR-SER-CYS-VAL-CYS-THR-LEU-THR-ILE-LVS-ARG-GLY-ARG-STOP 
AGG ATA GAC ACT TCT TGT GTA TG I ACA TTG ACC ATT AAA AGG GGA AGA TAG 
TCC TAT CTG TGA AGA ACA CAT ACA TGT AAC TGG TAA TTT TCC CCT TCT 

Modified Sequence: 

ARG-ILE-ASP-THR-SER-CYS-VAL-CY^-THR-LEU-THR-ILE-I.YS-ARG-GLY-ARG-STOP 
AGG ATA GAC ACT TCT TG I GTA TGT ACA TTG ACC ATT AAA CGC GTC CCA TAG 
TCC TAT CTG TCiA AGA ACA CAT ACA TGT AAC TGG TAA TTT GCG CAG GG 

The PCR is conducted as follows. Reactions (100 nl final volume) are assembled 
which contain Ix Pfu buffer (Stratagene). 1 ^m of each of the BDNF-5 and BDNF-3 primers, 
200 ^M of each of the four dNTPs, 1 unit Pfu polymerase (Stratagene) and 5 ^1 of the phage 
library DNA (isolated as described in section a, above; the BDNF cDNA does not contain any 
HindlU sites). Cycling is performed in a thermal cycler (Perkin-Elmer) for 30 cycles 
comprising 95°C for 90 sec, 50°C for 60 sec and 72°C for 2 min. 

The desired BDNF PCR product (approximately 750 bp) is confirmed by running the 
reaction products on a 2% low melting temperature agarose gel and isolating the 750 bp 
product using the Gelase protocol (Epicentre). The isolated fragment can be analyzed by 
digestion with Ncol whicli cuts the BDNF gene three times resulting in an approximately 350 
bp restriction fragment on a 2% agarose gel. 

Isolated BDNF PCR product is digested with Mwl (NEB) to create a compatible 
cohesive end for ligatmg the BDNF/Renin linker (described below), ethanol precipitated and 
resuspended at a concentration of 100 ng/ul. 

The BDNF./Renin linker is con.structed by annealing together the complementary 
oligonucleotides BD/rnF (SEQ ID NO:87) and BD/rnR (SEQ ID NO:88) (both obtained from 
NBl); annealing is conducted as described in section (a)(i) above. The annealed 
oligonucleotides form the following double stranded sequence which has a single-stranded 
extension at the 5^ end which is compatible with Aflul ends and a single-stranded extension at 
the 3' end which is compatible with Kpnl cnd.-^: 
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5^CGCGGAAGACrrAAGAAGAAACTGCCGTTCCACCTGCTGTACGGTAC-3'BD/mF 
3^CTTCTGAATTCTTCTTTGACGGCAAGGTGGACGACATGC-5' BD/mR 



The BDNF/Renin linker is ligated in excess (5 x molar) to 400 ng of the MIu\ digested 
5 BDNF PGR product for 90 min at 20°C in a 20 ^l reaction comprising 100 units T4 ligase 
(NEB), 5% PEG 8000, 50 mM Tris-HCl pH 7.8, 10 mM MgCl,, 1 mM ATP. Excess linker 
is removed by spin chromatography at 4''C usmg a CHROMA SPIN 100 (Cloniech) 
pre-equilibrated with TE buffer (pH 8.0) containing 100 mM NaCl. 

The mammalian expression vector, pTVMam-Ren (Example 4b) is prepared for the 
10 insertion of the BDNF/liiiker insert by removing the linker present in the vector as follows. 

Ten micrograms of pTVMam-Ren is digested with 10 U Hindlll (NEB) in a 50 ^il volume for 
90 min at 37*C; the cohesive ends are then filled in by adding 5 units of Klenow fragment 
and 2.5 ul 0.5 mM of each d^sITP The reaction is incubated for !5 min at 30**C. The 
Klenow fragment is heat inactivated for 10 min at 75°C. 
15 Ten units of A/)/?! is then added and the reaction is incubated for 90 min at 37°C. The 

vector is separated from the dNTPs, enzymes and digestion products using spin 
chromatography (CHROMA SPIN 1000, Clontech). 

Two hundred nanograms of the prepared vector is combined with 70 ng of the purified 
BDNF/linker insert in a 20 |.il volume containing Ix ligase buffer (NEB) and 100 U of 7^4 
20 ligase (NEB) and the reaction mixture is incubated at M^'C for 12 hours. The ligation 

products are transformed into competent JMlOl E. coli cells and the transformed cells are 
plated on LB/Ampicillin plates. Individual clones are picked and grown as 1 ml overnight 
cultures in LB/Amp media at 37°C at 240 rpn). Plasmid DNA is isolated from several 
positive clones that release a 350 bp fragment as the result of Nco\ restriction digests. The 
25 nucleic acid base sequence of the positive clones is determined to confirm that they contain an 
authentic BDNF sequence and a correct linker. 

Figure 35 provides a schematic map of the pTVM-R-BDNF vector. The location of 
the cytomegalovirus (CMV) promoter, the prcproRDNF sequences, the junction region, the 
IgG fragment, the bovine growth hormone (BGH) poly A site, the SV40 origin of replication, 
30 the neomycin-resistance gene and the ampicillin-resistance gene are indicated. The direction 
of transcription is indicated by the use of arrows inside the circle- 
Figure 36 shows the nucleotide and amino acid sequences present at the junction 
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protein, the hydrophilic spacer, the renin recognition site (site of cleavage is indicated by the 
arrow pointing between the Leu-Leu residues) and the amino -terminal end of the IgG 
tiagment (the affinity domain) are indicated As shoun m Figure 36, the hydrophilic spacer 
contains a leucine and three lysines immediately following the arginine residue which is 
present at the carboxy-terminus of BUNF. This hydrophilic spacer separates the authentic 
carboxy-terminus from the renin recognition sequence and the Kpnl-\gG Fc fragment. The 
lysines provide a hydrophilic spacer that is resistant to carboxypeptidase Y digestion at pH 
5.75 [Klarskov, Anal. Biochem. 180:28 (1989)J, while the leucine residue provides a barrier to 
CPB digestion in order to generate authentic NGF with a final CPA digest. 

iii) Construction Of A Furin Expression Vector To 
Enhance Pro Processing In Vivo 

BDNF IS expressed as a proprotein and formation of the mature, active form of BDNF 
requires that the pro region be proteolytically cleaved following the pro processing site 
comprising Arg-Val-Arg-Arg. This sequence has been well characterized as a furin 
recognition site [Hatsuzawa, supra and van de Ven, supra]. In experiments designed to test 
whether furin was responsible for the inability of LoVo cells to conduct pro processing, CHO 
cells were co-transfected with constructs capable of expressing wild type furin and prorenin 
[Takahashi, et ai, Biochem. Biophys, Res. Comm. 195:1019 (1993)]. CHO cells were also 
transfectcd with the prorenin construct alone. The cotransfected cells showed a much greater 
ability to process prorenin into mature renin than did the cells transfected with the prorenin 
construct alone. These studies demonstrate the utility of expressing furin in cell lines used to 
process pro regions from the protein of interest in vivo. Accordingly, a construct capable of 
expressing wild type furin is co-transfected with the plasmid encoding proBDNF to ensure 
complete processing (it is noted that furin may be expressed from a separate plasmid or on the 
same plasmid as that which encodes proBDNF). 

Furin is the enzyme responsible for constitutive processing and is expressed in all 
tissues and m most cell lines studied to date [Hatsuzawa, e{ a!. J Biol Chem. 265:22075 
(1990) and Schalken vt a!., 1 Clin. Jnvcsi. 80:1545 (1987)]. The cDNA sequence of human 
furin has been described [Van de Ouweland, c/ al Nuc. Acids Res. 18:664 (1990). To 
generate a plasmid capable of expressing furin in mammalian cells, human furin cDNA 
sequences are cloned and inserted into ati expression vector as follows. 
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A synthetic 30 nucleotide oligonucleotide which corresponds to the first ten amino 
acids of the translated furin gene is labeled with alkaline phosphatase (Lightsmith^*^ 
Luminescence Engineering System, Promega) using the manufacturer's protocols. This 
labeled oligonucleotide is used to screen a >.gtll human kidney cDNA library (Clontech). 
5 The sequence of this oligonucleotide is: 5^-ATGGAGCTGAGGC 

CCTGG ITGCTATGGGTGO' (SEQ ID NO:89). The library is screened by hybridization of 
the labelled oligonucleotide to nitrocellulose filters lifted off of plates containing amplified 
bacteriophage plaques [the filters are generated using standard protocols such as Ausubel, et 
al.. Short Protocols in Molecular Biology, Second Ed., John Wiley and Sons (1992), 6.1-6.2]. 

10 Hybridization of the alkaline phosphatase-labelled oligonucleotide to the filters is earned out 
using the manufacturer's protocols (Promega), Ten positive plaques are purified and DNA is 
isolated from each plaque. The isolated DNA is digested with Smdl to confirm the presence 
of the fragment containing the cuniplete open reading frame for human hirin cDNA (3209 
bp). The 3.2 kb Smal fragment is purified on a low melting temperature agarose gel and the 

15 purified fragment is ligated into the Smal site of pUC18 to generate pUC/FUR clones (see 
Figure 37) in preparation of cloning the furin cDNA into the pSV2neo vector (Clontech). 

Figure 37 provides a schematic map of the pUC/FUR construct. The location of the 
start (ATG) and stop codons of the furin cDNA arc indicated by "START" and "STOP," 
respectively. The direction of transcription of the furin cDNA is indicated by the dark black 

20 arrow. The location of the ampicillin-resistance gene ("Amp") is indicated. Selected 
restriction endonuclease recognition sites are also indicated. 

The pUC/FUR clones are screened for proper orientation of the furin cDNA within the 
multiple cloning site by the release of a 2902 bp fragment upon digestion with HindlW and 
EcoKV, The 2.9 kb HindWllEcoKW fragment from a pUC/FUR clone containing the furin 

25 cDNA m the desired orientation is ligated into pSV2neo which has been digested with HindWl 
and Hpal. This manipulation replaces the nco gene of pSV2neo with the furin cDNA and 
allows the expression of the furin cDNA under the control of the SV40 early promoter and 
provides the necessary polyadenylation and processing signals. The resulting construct is 
termed pSV2-fur. 

30 Figure 38 a schematic map of the pSV2-fur construct. The location of the start (ATG) 

and stop codons of the furin cDNA are indicated by "START" and "STOP," respectively. 
The direction of transcription of the furin cDNA is indicated by the dark black arrow. The 



wo 97/28272 PCT/IIS97/01470 
SV40 early promoter is represented by the open arrow. Selected restriction endonuclease 
recognition sites are indicated. 

E. coll cells containing pSV2-fur are grovvn in LB/ Amp (500 ml) and plasmid DNA is 
isolated using standard techniques (e.g., cesium chloride density centrifugation). The isolated 
pSV2-fur plasmid DNA is then digested with EcoR] in preparation for co-transfection into 
CHO cells as described below. 

iv) Introduction Of pTVM-R-BDNF And pSV2-fur 
Expression Vectors Into Mammalian Host Cells 
And Isolation Of Authentic BDNF 

CHO cells are one of the preferred cell lines for expression of recombinant fusion 
proteins (other preferred cell hnes include mouse myeloma Sp2/0, fibroblast cell lines and 
COS cells), CHO ceils naturally express produce furin; however, the endogenous level of 
furin production is insufficient to process recombinant pro-proteins which are expressed at 
high levels using viral promoters (e.g,, SV40 promoter) to drive the expression of the 
recombinant protein [Takahashi et al, supra and Yangita et al Endocrinology 133:639 
(1993)], In order to ensure that all of the recombinantly BDNF is proteclytically processed 
into mature BDNF in the transfected mammalian cells, a construct capable of expressing furin 
{e.g., pSV2-fur) is co-transfected with the BDNF expression construct (pTVM-R-BDNF). 
These plasmids are linearized prior to transfection into mammalian cells (the plasmids are cut 
with a restriction enzyme which does not cut within sequences necessary for the expression of 
either furin or BDNF). 

CHO cells are co-iransfected with equimolar amounts of the linearized pSV2-fur and 
pTVM-R-BDNF plasmids using the calcium phosphate co-precipitation procedure [Graham 
and van der Eb, Viroi 52:456 (1973)]. The transfected cells are grown in non-selective 
medium [Dulbecco^s Modified Eagle's Medium (DMEM) (Sigma) containing 10% FBS 
(Gibco)] in an incubator containing 5% CO. al 17°C for 48 hours. After 48 hours in non- 
selective medium, the cells arc transferred mto DMEM contaimng 10% FBS and 1.5X the 
killing dose of 0418 (about 800 |ag/ml for CHO cells; the killing dose of G418 is empirically 
determined for each cell line to be used), I hc selective medium (ie , DMEM containing 
G418) is changed every 2o days. Cells uhich survive growlh in the 041 8-containing 
medium for 12 days are diluted to 10 celbml with DMFM containing 10% FBS and G418 
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in the wells are then grown to confluence and the levels of human IgG are determined by dot 
blot analysis as described iii Example 1 (ammonium sulfate may be used to concentrate the 
culture supernatant prior to dot blot analysis). 

Clones which express high levels of human IgGl Fc fragment are expanded into 250 
5 nil flasks; the selective medium is changed every 2 days. In the final expansion, the medium 
contains 10% immunoglobulin-free (< 1 |ig/ml) fetal bovine scrum in place of 10% FBS. 
Media supplements such as bovine milk (Accurate Chemical and Scientific Corp., Westbur\\ 
NY), serum free media (HyQTM-CCMK HyClone Laboratories, Logan, UT) and protein free 
media (JHR Biosciences, Lenexa, KS) can also be used. The proteases inhibitors pepstatin 

10 and leupcptin (1 |ig/ml; Sigma) arc included in the medium to inhibit any contaminating renin 
or other proteases. Complete elimination of serum immunoglobulin from the medium is not 
essential because the protocol used for purification of recombinant fusion protein cleaves the 
desired protein away from the Fc fusion molccuiC while the fusion piutciu is bound io the 
immobilization matrix. In this example, the specific endoprotease employed, renin, is unlikely 

15 to cleave the contaminating IgG because renin has a high degree of specificity; therefore, the 
contaminating IgG would remain bound to tlie Protein A matrix. The use of low 
immunoglobulin medium only reduces non-specific Protein A binding events {i.e., binding of 
non-fusion protein) that would saturate the IgG binding matrix very quickly if medium 
containing conventional scrum were utilized. As an alternative to the use of low 

20 immunoglobulin medium, serum free medium may be employed. A serum free medium 

suitable for the growth of CHO cells is described in U.S. Patent No. 5,122,469, the disclosure 
of which is herein incorporated by reference. 

The desired clones are grown to confluency over 2 days and the medium is harvested 
and clarified by centrifugation at 1500 x The level of production of the fusion protein is 

25 determined by assaying for human IgGl Fc expression using the dot blot protocol described 
in Example 1. The supernatant is diluted with an equal volume of Tris buffer (Tns-HCI, pH 
8,6, 250 mM NaCl, 0.02% sodium azide) and passed over an immobilized Protein A matrix at 
a rate of 1 ml/min (Protein A Actidisk, .Arbor Technologies). The Protein A matrix is 
extensively washed with Tris buffer to remove any non-specific proteins. The matrix is 

30 washed with 10 ml of an intermediate Tris buffer of pH 7.0 (Tris-HCl, pH 7.0, 250 mM 
NaCl, 0.02% sodium azide) before washing with 20 ml of renin cleavage buffer (50 mM 
sodium phosphate, pH 6.^. 250 mM NaC^l. 5 mM EGTA, 2 mM PMSF). Five milliliters of 
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circulated through the disk at 37^C at a rate of 100 ^1/min for 2 hours. The efficiency of 
renin cleavage is monitored by measuring protein levels (absorbancc at 595 nm) in 10 pi 
samples of the circulating cleavage solution m 500 pi of modified Coomassie blue solution 
(Sigma). When expected levels of protein are released (determined by the amount of fusion 
protein loaded onto the matrix; this was measured in the initial dot blot assay), the circulating 
flow is collected and the circulating loop is washed with 3 ml of remn cleavage buffer and 
pooled. Collected fractions from several stable cell lines are analyzed on a 10% SDS-PAGE 
gel to determine whether co-transfection was successful. Released BDNF will migrate below 
the 16,5 kD lysozyme marker if correctly processed and above the lysozyme marker if 
unprocessed. 

The collected fractions contammg the cleaved BDNF and renin are concentrated using 
a Centriprep-3 (Amicon) to a concentration of 2 mg/ml and then separated by gel filtration 
chromatogidphy on a Sephadex G-5U column equilibrated with ammonia carbonate buffer (50 
mM ammonia carbonate pH 8.5, 150 mM NaCl, 5 mM EDTA). The processed BDNF is 
collected in fractions, pooled and concentrated to 1 mg/ml with using a Centriprep-3 
(Amicon). The concentrated BDNF is incubated with immobilized carboxypeptidase A at 2 
units enzyme/ml substrate [as described above in section (a)(ii)] for 120 minutes at 25^C with 
end-over-end rotation to remove the first three amino acids of the remaining renin recognition 
sequence (Leu-1", His-2\ Phc-3'). This reaction stops at the proline residue due to 
carboxypeptidase A's limited cleavage specificity (Ambler, supra). The released amino acids 
are removed and the buffer is changed to 100 mM sodium citrate (pH 5.75) using gel 
filtration chromatography on a Sephadex G-25 column. The void volume containing the 
cleaved BDNF is passed tliree times through an Actidisk containing 4 mg immobilized CPD- 
Y (Example 5) at a flow rate that will remove 100% of the proHne residues in the first 
passage through the Actidisk [approximately I ml/min (experimental determination of the 
activity of each disk is required as described in Example 5)]. This procedure completely 
removes the proline residue that is left after the CPA digestion. 

The buffer is then changed back to the ammonia carbonate buffer as described above 
and the sample is concentrated to 1 mg/ml using a Centricon-3 cartridge (Amicon) for the 
CPA digestion. The sample is incubated wuh immobilized CPA (2 units/ml substrate) for 180 
minutes as described above to remove the leucine and lysine residues that remain after the 
CPD-Y flow digestion. This reaction stops at the argmine residue at the carboxy-terminal 
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filtration through a Sephadcx G-25 column. Additional chromatography steps {i.e.. ion 
exchange, gel fikration, KP-HPLC and/or FPLC may be employed to gain even higher purity 
of the recombinant BDNF. 

It is clear from the toregoing that the present invention provides compositions (fusion 
proteins, recombinant expression vectors) and methods which permit the production of 
recombinant proteins nhich contain only those amino acids found in the protein of interest. 
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SEQUENCE LISTING 

(!) GENERAL INFORMATION: 

(l^ APPLICANT: 5GARLAT0, GREGORY D. 

(li) TITLE OF IWENTION: PROTEIN EXPRESSION SYSTEM 

'iiil NUMBER OF SEQUENCES: 90 

(iv) CORRESPONDENCE ADDRESS: 

<A) ADDRESSEE: MEDLEN & CARROLL 

(B) STREET: 22 0 MONTGOMERY STREET, SUITE 2 2 00 

(C) CITY: SAN FRANCISCO 

(D) STATE: CALIFORNIA 

(E) COUNTRY: UNITED STATES OF AMERICA 

(F) ZIP: 94104 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC- DOS/MS -DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 09/595.043 

(B) FILING DATE: 31-JAN-1996 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFOPJ^TION: 

(A) NAME: CARROLL, PETER G. 

(B) REGISTRATION NUMBER: 32,837 

(C) REFERENCE/DOCKET NUMBER: SGAR-00371 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (415) 705-8410 

(B) TELEFAX: (415) 397-8338 

(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CFLAJIACTERISTICS ; 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 
CO STRANEEDNESS : 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE; peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

Asp Tyr Lys Astd Aso Asp Asp Lys 

1 5 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CH7iRACTERISTICS : 
(A) LENGTH: 6 ammo acids 
(B: TYPE: ammo acid 

(C) STRANDEDITESS : 

( D ) TOPOLOGY : unknovvTl 

(ii; MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Pro Xaa Gly Pro Xaa 

1 
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(A) LENGTH: 7 ammo acids 

(B) TYPE: amino acid 

(C) STRAlTDEDtCESS : 

(D) TOPOLOGY: unknov^Ti . 

(li) MOLECULE TYPE: peptide 

(XI ; SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

Pro Phe His Leu Leu Val Tyr 
1 b 

(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 arr.ino acids 

(B) TYPE: a-^inc acid 

(C) STRANDEDMESS : 

(D) TOPOLOGY: unknouTi 

(ii) MOLECULE TYPE: peptide 

(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 5 

(D) OTHER INFORivIATION : /note= "The amino acid at this 
location can be any anr.ino acid except proline or arginine." 

(Xi) SEQUENCE DESCRIPTION; SEQ ID NO : 4 : 

lie Glu Gly Arg Xaa 
1 5 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: ammo acid 

(C) STRANDSDNESS : 

(D) TOPOLOGY: unknown 

Cii) MOLECULE TYPE peptide 

(ix) FEATURE: 

(A) NAME/KEY Peptide 

(B) LOCATION 5 

(D) OTHER INFORMATION: /note=: "The amino acid at this 
location can be any amino acid except proline or arginine . " 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

He Asp Gly Arg Xaa 



[2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQLT:NCE CHARJvCTERISTICS : 

(A) LENGTH: S ammo acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY, unknown 

[ii) MOLECULE TYPE: peptide 

[ix) FEATURE: 

(A) NAME/KLY: Peptiae 
(B^ LOCATIOr:- ^ 
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(D) OTHER INFOKM;.TION: /note== "The amine acid at this 
location can be any ammo acid except proline or arginine . " 

(Xi) SEQUENCE DESCk T FTI ON : SEQ IE NO : 5 : 

Ala Glu Gly Arg Xaa 

1 5 

(2) INFORMATION FOR SEQ ID NO : 7 : 

ii) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 ammo acids 

(B) TYPE: aT.mo acid 

(C) STRANDEDNESS ; 

(D) TOPOLOGY: unknown 

(li) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

Leu Val Pro Arg Gly Ser 
1 5 

(2) INFORMATION FOR SEQ ID NO : 8 : 

lij SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(XI ) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Gin Gly Pro Gly Gin Lys Gin Lys Gin Lys 
15 10 

(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D> TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

Phe Arg Ser Val 
1 

(2; INFORMATION FOR SEQ ID >J0 : 1 0 : 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: arnmo acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: unknov^-n 

(ii) MOLECULE TYPi: : peptide 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO ; 1 0 : 



Val Pro Phe Arq 
1 ' 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: S amino acids 

(B) TYPE; ammo acid 

(C) STRANDKDNEGS : 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: pcptiide 

{ix) FEATimE: 

(A) NAME/KEY: PeDt:ide 

(B) LOCATION: 6 

(D) OTHER INFORMATION; /note= "The amino acid at this 
location IS any non-acidic arr.ino acid." 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Leu Val Pro Arg Gly Xaa 
1 5 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS : 

(O TOPOLOGY. Uiik.iiuw;i 

(ii) MOLECULE TYPE: peptide 

(ix) FEATURE: 

(A) NAME/KEY; Peptide 

(B) LOCATION: 2 

(D) OTHER INFORMATION: /note= ''The amino acid at this 
location can be either leucine, phenyalanine , isoleucine, 
valine, alanine or tryptophan-" 

(Xi) SEQUENCE DESCRIPTION; SEQ ID N0;12: 

Arg Xaa Val Arg Gly 

1 ^ 

(2) INFORMATION FOR SEQ ID NO: 13: 

(l) SEQUENCE CHAR/.C7ERISTICS : 

(A) LENGTH: L; amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY, unknown 

(ii) MOLECULE TYPE peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 13 : 

Asp Asp Asp Asp Lys 

1 l/ 

(2) INFORMATION FOR SZ^ 1- NO ; 14 ; 

(i) SEQUENCE CKAPJ-^CTERTSTICS ; 

(A) LENGTH; 4 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS; 

(D) TOPOLOGY; un):nc'.m 

MOLECLT^K TYPH : rr^v:-^. \de 
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(Xl) SEQUENCE DESCR I PTTCN : SEQ ID NO: 14: 
Arg Xaa Arg Arg 



(2) INFORMATION FOR SEQ ID NO: 15: 

fi) SEQUENCE CHARJ^vCTERI STICS : 

(A) LENGTH: 4 aiiuno acids 

(B) TYPE: amino acid 

(C) STRAKDEDNESS 

(D) TOPOLOGY: unknown 

(li) MOLECULE TYPE: peptide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 15 : 
Arg Xaa Lys Arg 



(2) INFORMATION FOR SEQ ID NO: 16: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 amino acids 

(B) TYPE; ammo acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: uaknown 

(ii} MOLECULE TYPE: peptide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
Arg Arg Lys 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 amino acids 

(B) TYPE: a-rino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 
(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 17: 
Arg Lys Lys 



(2) INFORMATION FOR SEQ ID NO: 18; 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 ammo acids 

(B) TYPE: ammo acid 

(C) 3TRANE)EDNES5~ : 

(D) TOPOLOGY: ut:k:K3wn 

(ii) MOLECLT^E TYPE: peir.ide 

(xi) SEQUENCE DE5: CRI PC lOi: ; SEQ ID NO : 1 8 : 
Lys Arg Lys 



(2) INFORMATION FOR SKO NO : 1 9 
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(C) STRANIjEDrJEr>5 : 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide, 
(xi) SEQUENCE DE5CP. I PTTCIJ ; SEQ ID NO; 19: 
Lys Lys Lya 



(2) INFORMATION FOR SE'^' ID NO: 20: 

(i) SEQUENCE CHAf.ACTEKI STICS : 

(A) LENGTH: 4 amine, acids 

(B) TY'PE: ani^no acid 

(C) STRANDEDKESS : 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:20: 
Arg Arg Arg Lys 



(2) INF0PJ4ATI0N FOR SEC' IC NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 4 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS : 

{ D } TOPOLOGY : unknown 

(ii) MOLECULE TYPE: peptide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
Arg Arg Lys Lys 



(2) INFORMATION FOR SEC ID NO:22: 

(i) SEQUENCE CHAR ACTERT ^.TI CS : 

(A] LENGTH: 4 ammo acids 

(B) TYPE: ammo acid 
IC) STRANDEDNESS; 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 
(xi) SEQUENCE DESCRIPTION': SEQ ID NO: 22: 
Lys Arg Arg Lys 



(2) INFORMATION FOR SEQ ID Nj:23: 

(i) SEQUENCE CHAR AC TER I ..TI C5 : 
(A) LENGTH: 4 ammi' acids 
{B> TYPE: ammo acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: unknc^wn 

(11) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

Arg Lys Arg Lys 
1 

(2) INFORMATION FOR SEQ ID NO: 24; 

fi) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 4 ammo acids 

(B) TYPE: arino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unkriovm 

(ii) MOLECULE TYPE: peptide 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Arg Lys Lys Lys 
1 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 
vC) STRAbTDEuNhj^S : 

(D) TOPOLOGY: unknovsm 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 5 : 

Lys Arg Lys Lys 
1 

(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS: 

(D ) TOPOLOGY ; unknown 

(ii) MOLECULE TYPE peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Lys Lys Arg Lys 
1 

(2) INFORMATION FOR SEQ ID NO:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: unknown 

(li) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 7 : 

Arg Arg Arg Arg Lvn 
1 

(2) INFORMATION FOR SEQ ID : 2 6 : 
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(C) STRANDSDrrESS : 

(D) TOPOLOGY: unknown 

[ii) MOLECULE TYPE: peptide . 

(Xl) SEQUENCE DE "CP T ?TI r:M : SEQ ID NO: 26: 

Arg Arg Arg Lys Lys 



INFORMATION FOR SE;; ID !JO:29: 

fi) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: ammo acids 

(B) TY'PE : arr.mo acid 

(C) STRANDEDMESS ; 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 
(xi) SEQUENCE DE£C?.I PTION : SEQ ID NO: 29: 
Arg Arg Lys Arg Lys 



i? ) TNFOPMATION FOR SEQ ID M'3 : 3 0 : 

(i) SEQUENCE CHATACTERinTICS : 

(A) LENGTH: aTaiio acids 

(B) TYPE, a-mo acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 
(xi) SEQUENCE DESC?.IPTIO:r: SEQ ID NO: 30: 
Arg Lys Arg Arg Lys 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: ar-,ino acids 

(B) TYPE: a:r;ino ACid 
iC) STRANDEDriESS : 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 
(xi) SEQUENCE DESCRI PTIO:; : SEQ ID NO: 31: 
Lys Arg Arg Arg Lys 



(2) INFORMATION FOR SEj ID NO : 3 2 : 

(1) SEQUENCE CHAFACTERISTICS ; 

(a; LENGTH: 7. amino acids 

(B) TYPE, arr.i no acid 

(C) stranded: ]ES S : 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 32 : 
Arg Arg Lvs Lyu I.ys 

(2) INFORMATION FOR SZQ ID N'J:33; 

(l) SEQUENCE CKAPACTERISTICS : 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEnrJESG ; 

(D) TOPOLOGY: unknovsm 

(ii) MOLECULE TYPE, peptide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
Arg Lys Arg Lys Lys 

(2) INFORMATION FOR SEQ ID NO:34: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH; 5 amino acids 

(B) TYPE: aniino acid 

{D) TOPOLOGY; unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Arg Lys Lys Arg Lys 
1 5 

(2) INFORMATION FOR SEQ ID NO : 3 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 ammo acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknown 

[ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DES C-^.IPTION : SEQ ID NO: 35: 

Lys Arg Arg Lys Lys 

1 5 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: S amino acids 

(B) TYPE; arrano acid 

(C) STRANEEDNESS : 

(D) TOPOLOGY: unknowr. 

(ii) MOLECULE TYPE peptide 

(xi) SEQUENCE DESCRIPTION. SEQ ID NO : 3 6 : 

Lys Arg Lys Arg Lys 
1 

(2- INFORMATION FOR SEv ID MO 3-^; 
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(C) STRANDFCNESS : 

(D) TOPOLOGY; unknown 

(ii) MOLECULE TYPF : peptide, 
(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 37: 
Lys Arg Arg Lys Lys 



(2) INFORMATION FOR SZQ ID NO; 38: 

(i ) SEQUENCE CRAP.ACTERISTICS : 

(A) LENGTH: 5 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO ; 3 8 : 

Arg Lys Lys Lys Lys 
1 S 

(2) INFORMATION FOR SEQ in TCn • 3 q : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY : unknown 

{ii) MOLECULE TYPE: peptide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 9 : 
Arg Ser Lys Arg 



(2) INFORMATION FOR SEQ ID NO : 4 0 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE; ammo acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE- peptide 

(xi) SEQUENCE DESCRIPTICN: SEQ ID NO : 4 0 : 
Arg Xaa Xaa Arg 



(2) INFORMATION FOR SFQ ID NO : 4 1 : 

( i ) SEQL^NCE CHAF.ACTERISTICS : 

(A) LENGTH: 9 amino acids 

(B) TYPE: atumo acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 
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{Xl) SEQUEIJCE DESCRIPTION: SEQ ID N0:41: 

Arg Arg Lys Leu Asp Asp Asp Asp Lys 

1 5 

(2) INFORMATION FOR SEQ ID NO:42; 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 ammo acids 

(B) TYPE: ammo acid 

(C) STRANDEDMESS ; 

(D| TOPOLOGY: unknown 

(11) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

Arg Lys Lys Lys Leu Val Pro Arq 

1 5 

(2} INFORMATION FOR SEQ ID NO : 4 3 : 

(x) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 ammo acids 

(B) TYPE: amino acid 

(D) TOPOLOGY: unknown 
(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO:43: 

Leu Val Pro Arg Gly Thr 

1 5 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
CCGGGCGCGC GCGC 

(2) INFORMATION FOR SEQ ID NO : 4 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: n'lcleic acid 

(C) STPJINDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPL:: DNA (genomic) 

(XI ) SEQL-ENCE CESCRIPTI :,:J : SEQ ID NO : 4 5 : 

CGTTTCGCCG GCTGGTTCCG CGGGGTCGAC GGATTCAGCT AGCA 

(2) INFORMATION FOR SFQ ID NO: 46; 

11) SEQUENCE CH AI' A CTER I. I CS : 
fA) LENGTH: 5 2 base pairn 
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(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
AGCTTGCTAG CTGAATCCGT CGACCCCGCG GAACCAGCCG GCGAAACGAG CT 5 2 

(2) IMFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 
CGTTTAAAAA GAAACCGCGG GGCCCGGGTA C 31 
(2) INFORMATION FOR SEQ IE NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) 5TRANDEDNES5 : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
CCGGGCCCCG CGGTTTCTTT TTAAACGAGC T 31 
(2) INFORMATION FOR SEQ ID NO : 4 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 699 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cENA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1 . .693 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 9 : 

GAG CCC AAA TCT TGT GAC AAA ACT CAC ACA TGC CCA CCG TGC CCA GCA 4 8 

Glu Pro Lys Ser Cys Asp Lys Thr His Thr Cys Pro Pro Cys Pro Ala 
IB 10 15 

CCA GAA CTC CTG GGG GGA CCG TCA GTC TTC CTC TTC CCC CCA AAA CCC 9 6 

Pro Glu Leu Leu Gly Gly Fro Cer Val Phe Leu Phe Pro Pro Lys Pro 

20 ' 25 30 

AAG GAC ACC CTC ATG ATT TCT CGG ACC CCT GAG GTC ACA TGC GTG GTG 14 4 

Lys Asp Thr Leu Met He Set Arg Thr Pro Glu Val Thr Cys Val Val 
35 40 45 

GTG GAC GTG AGC CAC GAA GAC CCT GAG GTC AAG TTC AAC TGG TAC GTG 192 
Val Asp Val Ser His Glu Asp Pro Glu Val Lys Phe Asn Trp Tyr Val 

5 0 5S 60 ^ 

GAC GGC GTG GA'^- GTG CAT AA^ CCC AAG ACA AAG CCG CGG GAG GAG CAG 24 0 
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TAC AAC AGC ACG TAG CGG GTG GTC AGC GTC CTC ACC GTC CTG CAC CAG 288 

Tyr Asn £er Thr Tyr Arg Val Val Ser Val Leu Thr Val Leu His Gin 

95 90 95 

GAG TGG CTG AAT GGC AAG GAG TAC AAG TGC AAG GTC TCC AAC AAA GCC 3 36 

Asp Trp Leu Asn Gly Lys Glu Tyr Lys Cye Lys Val Ser Asn Lys Ala 

100 105 110 

CTC CCA GCC CCC ATC GAG AA.^ ACC ATC TCC AAA GCC AAA GGG CAG CCC 3 84 

Leu Pro Ala Pro He Glu Lyj Thr He Ser Lys Ala Lys Giv Gin Pro 

120 125 

CGA GAA CCA CAG GTG TAC ACC CTG CCC CCA TCC CGG GAT GAG CTG ACC 4 32 

Arg Glu Pro Gin Val Tyr Thx Leu Pro Pro Ser Arg Asp Glu Leu Thr 

130 13L 140 



AAG AAC CAG GTC AGC CTG ACC TGC CTG GTC AAA GGC TTC TAT CCC AGC 
Lys Asn Gin Val Ser Leu Thr Cys Leu Val Lys Gly Phe Tyr Pro Ser 
150 155 i€0 



AGC CTC TCC CTG TCT CCG GGT AAA TGA 
Ser Leu Ser Leu Ser Pro Gly Lys * 
225 230 

(2) INFORMATION FOR SEQ ID NO: 50; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 3 ammo acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: 1 inear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 

Glu Pro Lys Ser Cys Asp Lyr^ Thr His Thr Cys Pro Pro Cys Pro Ala 

- 5 10 15 

Pro Glu Leu Leu Gly Gly Pro Ser Val Phe Leu Phe Pro Pro Lys Pro 

2 0 2 5 3 C 

Lys Asp Thr Leu Met He S-r Arg Thr Pro Glu Val Thr Cys Val Val 
3 5 4 0 4 5 

Val Asp Val Ser His Glu A.np Pro Glu Val Lys Phe Asn Trp Tyr Val 

50 55 60 

Asp Gly Val Glu Val llir. A.-n Ala Lys Thr Lys Pro Arg Glu Glu Gin 

65 7C 75 80 

""yr A^n r;^- Thr T\'r A- ■ '/ i i ^ 1 ^-r Vn ^ v^.. ^h- V=i^ '^': Hi- ^^i- 
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GAC ATC GCC GTG GAG TGG GAG AGC AAT GGG CAG CCG GAG AAC AAC TAC 52 8 

Asp He Ala Val Glu Trp Glu Ser Asn Gly Gin Pro Glu Asn Asn Tyr 
165 170 175 

AAG ACC ACG CCT CCC GTG CTG GAC TCC GAC GGC TCC TTC TTC CTC TAC =^7^; 
Lys Thr Thr Pro Pro Val Leu Asp Ser Asp Gly Ser Phe Phe Leu Tyr 
180 les 190 

AGC AAG CTC ACC GTG GAC AAG AGC AGG TGG CAG CAG GGG AAC GTC TTC 624 
Ser Lys Leu Thr Val Asp Lys Ser Arg Trp Gin Gin Gly Asn Val Phe 
195 200 205 

TCA TGC TCC GTG ATG CAT GAG GCT CTG CAC AAC CAC TAC ACG CAG AAG 672 
Ser Cys Ser Val Met His Glu Ala Leu His Asn His Tyr Thr Gin Lys 
210 215 220 
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100 105 110 

Leu Pro Ala Pro He Glu Lys Thr He Ser hys Ala Lys Gly Gin Pro 
115 12 0 - 12 5 

Arg Glu Pro Gin Val Tyr Thr Leu Pro Pro Ser Arg Asp Glu Leu Thr 
130 ' 135 140 

Lys Asn Gin Val Ser Leu Tar Cyj Leu Val Lys Gly Phe Tyr Pro Ser 
145 150 155 160 

Asp He Ala Val Glu Trp Glu Ser Asn Gly Gin Pro Glu Asn Asn Tyr 
165 170 175 

Lys Thr Thr Pro Pro Val Leu Asp Ser Asp Gly Ser Phe Phe Leu Tyr 
180 185 190 

Ser Lys Leu Thr Val Asp Lys Ser Arg Trp Gin Gin Gly Asn Val Phe 
195 20C 205 

Ser Cys Ser Val Met Has Glu Ala Leu His Asn His Tyr Thr Gin Lys 
210 215 220 

Ser Leu Ser Leu Ser Pro Gly Lys * 

22S 230 

(2) INFORMATION FOR SEO ID NO : 5 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 4 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
CCCCCGCCGG CACACATGCC CACCGTCGCC AGCA 3 4 

(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE; nucleic acid 
iC) STRANDEDNFSn : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DHA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 52 : 
CCCCCGTCGA CGGACATGCC CACCGTGCCC A 31 
(2) INFORMATION FOR SEO ID NO: 53: 

(i) SEQLT-NCE CHAP.ACT:.:R 1 STICS : 

(A) LENGTH: 3 5 nase pairs 

(B) TYPE: nucleic acid 

(C) STRANT:)ED:iEr;;. : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic) 
(xi) SEQUENCE DESCH:IPT10N : SEQ ID NO : 5 3 : 
GGGGTACCCA CACATGCCCA : :C 'JT' 'CCCAG CACCT 3 5 
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fi) SEQUENCE CHAPJVCTERISTICS : 

(A) LENGTH: 3S base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

(11) MOLECUI.E TYPI.; D:]A (genomic) 

(xi) SEQUENCE EESCRIPTION: SEQ ID NO : 54 ; 

CCCCCGCTAG CGTCATTTAC CCGGAGACAG GGAGA 

(2) INFORMATION FOR SEw ID NO : 5 5 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 5 base pairs 
(B} TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY, linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Xl) SEQUENCE DESCRIPTION: SEQ ID NO : 5 5 : 

CATGGACTGA AAGCTTGACG GTACCTGAGC TAGCT 

(2) INFORMATION FOR SEQ ID NO:S6: 

[ i ) S EQUENCE CHARACTERI S T I CS ; 

(A) LENGTH: ^5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED:jess ; single 
{D} TOPOLOGY linear 

(ii) MOLECULE TYPE DNA [genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

AGCTAGCTAG CTCAGGTACC GTCAAGCTTT CAGTC 

(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHAP^\CTERISTICS : 

(A) LENGTH: JA base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY linear 

(ii) MOLECULE TYPE DlIA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 

CATGAAACAA AGCACTATTG r'ACTGGCTTT ACCG 

(2) INFORMATION FOR SE,' 10 NO : 5 : 

( i ) S E QUENC E CHA V. :^r'V y F. I S T I C S : 

(A) LENGTH: :3 bac- p.Tirs 

(B) TYPE : n;] Me acid 

(C) STRANDED:JE^:S : r.mgle 

(D) TOPOLOGY: linear 

(ii) MOLECULE Ti^E : DKA i:qenDmic) 

(xi; SEQUENCE DKl^ :' iv rb:::: SEQ id N0:5B: 



wo 9insiii 



PCT/IIS97/0147() 



(i) SEQUENCE CHAP^vCTERI £ TI CS : 
(A] LENGTH: J 6 ^ase pairs 
(B/ TYPE: nucleic acid 

(c; stranded:;ess ; smgie 

{D} TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic) 

(XI) SEQUENCE DESCRIPTION ; SEQ ID NO : 59 ; 
AGCTTTTGTC ACAGGGGTAA ACAGTAJVCGG TAAAGC 3G 
!2} INFORMATION FDR SE^ ID NO : 5 0 : 

(i) SEQUENCE CHAf:ACTERI5:;TIC£ : 

(A) LENGTH: 21 base pairs 

(B) TYPE : r.ucleic acid 

(C) 5;TRANDEDr;ES^. : single 

(D) TOPOLOGY: linear 

(ii) MOLECTJLE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 0 : 
CAGTGCAATA GTGCTTTGTT T 21 
(2) INFORMATION FOR SFQ ID NO : 6 1 : 

(i) SEQUENCE CHAP.ACTERISTICS : 

(A) LENGTH: 2 0 amino acids 

(B) TYPE: a-^.ino acid 

(C) STRAIJDEDNESS : 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

Met Lys Gin Ser Z r.r lie Ala Leu Ala Leu Pro Leu Leu Phe Thr Pro 

IS 10 15 

Val Thr Lys Ala 

20 

(2) INFORMATIOr: FOR SLv ID NO : 6 2 ; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: -i y tase pairs 

(B) TYPE: nu-lcic acid 

(C) STRANDEC'ESS : single 

(D) TOPOLOGY linear 

(ii) MOLECULE TYPE. DNA (genomic) 

(XI ) SEQUENCE DESrYIFTION: SEQ ID NO: 62: 
CTAGCTGATC GCGAAAGAA3 CTYCCGTTC:: ACCTGCTGGT GTACGGTAC 4 5 

(2) INFORMATION FOR SE j 12 NO : 6 3 : 

(i) SEQUENCE CHAKACTERISTICS : 

(A) LENGTH: -11 base pairs 

(B) TYPE: nucleic acid 

(C) STRAimFr-JE^^S : smqlp 

(D) TOPOLOGY: ±:.:iear 
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CGTACACCAG CAGGTGGAAC ::-CAnCTrCT TTCGCGATCA G 
(2) INFORMATION FOR ID KO:64: 

il) SEQUENCE CHAJ^\^CTERISTICS : 

(A) LENGTH: base pairs 

(B) TYPE: rij-lf-Tc acid 

(C) STRANDK::IIESS : single 

(D) TOPOLOGY, linear 

(ii) MOLECULE TYPE; DNA (genomic) 

(Xl) SEQUENCE DEi^CR I P'l'ION : SEQ ID NO:64: 

GATCTTCGCG AAAGAAGAAG CTTCCGTTTC ACCTGCTGGT CTACGGTAC 

(2) INFORMATION FOR SE'j TT) NO:CS: 

(i) SEQUENCE CHAK-.:^TL'RISTICS : 

(A) LENGTH: 4 1 base pairs 

(B) TYPE: nucleic acid 

(C) STRArrDKUijSSS : sir.glo 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: ^:EQ ID NO : 6 5 : 

CGTAGACCAG CAGGTGAAAC GGAAGCTTCT TCTTTCGCGA A 

(2) INFORMATION FOR S7.0 ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 ua^e pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDrNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCR I ?TI 3:J : SEQ ID NO: 66: 
CTAGCCCCCC 

(2) INFORMATION FOR SEO ID MO : b 7 : 

(l) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 10 b.iRP pairs 

(B) TYPE: nucleic acid 
<C] STRANDEDr.'ESS ; sincrle 

(D) TOPOLOGY: linear 

[ii) MOLECULE TYPi:: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: ^EQ ID NO : 5 7 : 
GATCGGGGGG 

(2; INFORMATION FOR 3E, IC NO : 5 5 : 

(i) SEQUENCE CHAR.ACTRP.irTIi-^r. ; 

(A) LENGTH; 37 base pairs 

(E) TYPE: nucleic acid 
:C) STRA:rDRnt:FSS : r:nqle 
(D) TOPOLOGY: iine.i: 

'ii ' MOLECm.F TY?^ ■ ■-r^-r— i 
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GATCTTCGCG AAAGAAGAAH rrr.GT'V^^rzZ GGGGTAC 3 
(2) INFORMATION FOR GE^ ID NiJ:69; 

(i) SEQUENCE CrL^J^\CTERISTICS : 

(A) LENGTH: 1^9 ba£- VAizs 

(B) TYPE: nu.-Leic .i"id 

( C ) S TRAJ^JBSDt : F =^ S : r. :.. ng 1 e 
(D) TOPOLOGY linear 

(ii) MOLECULE TYPF DNA (qenomic) 

(xi) SEQUENCE DESC}: : PTIOH : SEC ID NO : 6 9 : 
CCCGCGGAAC CAGCTTCTTC ^TTCGCGAA 2 
(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CKAR7.CTERI3TICS : 

(A) LENGTH: 2 2 ammo acids 

(B) TYPE; ammo acid 

(C) STRANDEDIIESS : 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE peptide 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

Ala Leu Lys Asp Aia Gin Thr Asn Ser Ser Ser Phe 
15 10 

(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CIIAR■J•^CTERISTICS ; 

(A) LENGTH: 7 aT.mo acids 

(B) TYPE: ammo acia 

(C) straotedi;ess : 

(D) TOPOLOGY: unknouTi 
(ii) MOLECULE TYPE peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 

Phe Leu Ala Pro Arq Gly Thr 

1 5 

(2) INFORMATION FOR SEC' ID NO:'72: 

fi) SEQUENCE CHARJ-.CTERISTICS : 

(A) LENGTH : t amino acids 

(B) TYPE: am: no acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY : unknov.-n 

(11) MOLECULE TYPE pept:id^- 

(xi) SEQUENCE DEGCFIPTIC:: SE^j I J NO: 72: 

Ala Pre Tyr Gly Pic Pro 
1 

(2 1 INFORMATION FOR SE:^- ID NO : ^ 3 : 

(i) SEQUENCE CH7dlArTER T 5:T I CS : 

(A) LENGTH: :0 amino acids 

(E) TYPE : ammo acid 

(0) straiidedkf::-^." : 



wo 97/28272 



PCT/IJS97/01470 



1^0 li'^ 140 



49 



96 



Cxi) SEQOENCE DESCRIPTION: SEQ ID NO: 73: 
Pro Leu Ser Arg he.w Ser Val Ala Lys Lys 

(2) INFORMATION FOR SEQ ID rJC:7'l: 

(i) SEQUENCE CHARACTERTSTICS : 

(A) LENGTH: 726 base pairs 

(B) TYPE: nucleic acid 

(C) STRAt>n:EDNE5:5: : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

CA) NAME /KEY: CDS 
{B} LOCATION: 1 . . 72C 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 

ATG TCC ATG TTG TTC TAC ACT CTG ATC ACA GCT TTT CTG ATC GGC ATA 
Met Ser Met Leu Phe Tyr Tnr Leu lie Thr Ala Phe Leu He Gly He 
^ ^ 10 15 

CAG GCG GAA CCA CAC TCA GAG AGO AAT GTC CCT GCA GGA CAC ACC ATC 
Gin Ala Glu Pro Hin Ser Glu Ser Asn Val Pro Ala Gly His Thr He 

20 25 30 

CCC CAA GTC CAC TGG ACT AAA CTT CAG CAT TCC CTT GAC ACT GCC CTT 14 4 

Pro Gin Val His Trp Thr Lyr, Leu Gin His Ser Leu Asp Thr Ala Leu 
35 40 45 

CGC AGA GCC CGC AGC GCC CCG GCA GCG GCG ATA GCT GCA CGC GTG GCG 192 
Arg Arg Ala Arg Ser Ala Pro Ala Ala Ala He Ala Ala Arg Val Ala 

50 Sb 60 

GGG CAG ACC CGC AAC ATT ACT GTG GAC CCC AGG CTG TTT AAA AAG CGG 24 0 

Gly Gin Thr Arg Asn He Thr Val Asp Pro Arg Leu Phe Lys Lys Arg 

'^0 75 80 

CGA CTC CGT TCA CCC CGT GTG CTG TTT AGC ACC CAG CCT CCC CGT GAA 28 8 

Arg Leu Arg Ser Pro Arg Val Leu Phe Ser Thr Gin Pro Pro Arg Glu 

85 90 95 

GCT GCA GAC ACT CAG GAT CTG GAC TTC GAG GTC GGT GGT GCT GCC CCC 3 36 

Ala Ala Asp Thr Gin Asp Leu Asp Phe Glu Val Gly Gly Ala Ala Pro 
100 105 110 

TTC AAC AGG ACT CAC AGG AGC AAG CGG TCA TCA TCC CAT CCC ATC TTC 384 
Phe Asn Arg Thr His Arg S-r Lys Arg Ser Ser Ser His Pro He Phe 
115 120 125 

CAC AGG GGC GAA TTC TC7 GTC; TOT GAC AGT GTC AGC GTG TGG GTT GGG 
His Arg Gly Glu Phe Ser Val Cy:: Asp Ser Val Ser Val Trp Val Gly 



43; 



GAT AAG ACC ACC GCC ACA GA^ ATC AAG GGC AAG GAG GTG ATG GTG TTG 48 0 

Asp Lys Thr Thr Ala Thr H- Lv5 Glv Lys Glu Val Met Val Leu 

^^'^ 155 160 



wo 91I2S172 



PCT/US97/01470 



GGA GAG GT3 AAC ATT AAC .^J\C ACT GTA TTC AAA CAG TAG TTT TTT GAG 52 8 

Gly Glu Val Asn lie Asn Aoi. Ser Val Phe Lys Gin Tyr Phe ?he Glu 

16S 170 175 

ACC AAG TGC CGG CCA .V^T CCC GTT GAC AGC GGG TGC CGG GGC ATT 576 

Thr Lys Cys Arg Asp Pro Asn Pro Val Asp Ser Gly Cys Arg Gly He 
180 18b 190 

GAC TCA AAG CAC TGG AAC CCA TAT TGT ACC ACG ACT CAC ACC TTT GTC 624 
Acp Ser Lyn His Trp Asn Ser Tyr Cys Thr Thr Thr His Thr Phe Val 
195 200 205 

AAG GCG CTG ACC ATG GAT GGC AAG CAG GCT GCC TGG CGG TTT ATC CGG 6 72 

Lys Ala Leu Thr Met Asp Gly Lys Gin Ala Ala Trp Arg Phe He Arg 
210 215 220 

ATA GAT ACG GCC TGT GTG TGT GTG CTC AGC AGG AAG GCT GTG AGA AGA 720 
He Asp Thr Ala Cys Val Cys Val Leu Ser Arg Lys Ala Val Arg Arg 
225 230 235 240 

GCC TGA 72 6 

Ala * 

(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHAP./.CTERISTICS : 

(A) LENGTH: 242 amino acids 

(B) TYPE: sn^.mo acid 
(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 

Met Ser Met Leu Phe Tyr Thr Leu He Thr Ala Phe Leu He Gly He 
15 10 15 

Gin Ala Glu Pro His Ser Glu Ser Asn Val Pro Ala Gly His Thr He 

20 25 30 

Pro Gin Val His Trp Thr Lys Leu Gin His Ser Leu Asp Thr Ala Leu 
35 40 45 

Arg Arg Ala Arg Ser Ala Pro Ala Ala Ala He Ala Ala Arg Val Ala 

50 55 60 

Gly Gin Thr Arg Asn He :hr Val Asp Pro Arg Leu Phe Lys Lys Arg 
65 70 75 80 

Arg Leu Arg Ser Pro Arg Val Leu Phe Ser Thr Gin Pro Pro Arg Glu 
05 90 55 

Ala Ala Asp Thr Gin Asp Leu Asp Phe Glu Val Gly Gly Ala Ala Pro 

100 105 110 

Phe Asn Arg Thr His Arg Ser Lys Arg Ser Ser Ser His Pro He Phe 

115 ^ 120 ^ 125 

His Arg Gly Glu Phe Ser V^l Cyi: Ar.p Ser Val Ser Val Trp Val Gly 
130 135 ' 140 

Asp Lys Thr Thr Ala Thr Asp He Lys Gly Lys Glu Val Met Val Leu 
145 150 155 160 

Gly Glu Val Asn He Asn Asn Sor Val Phe Lys Gin Tyr Phe Phe Glu 

165 170 175 



wo 9ia%iii 



Asp Ser Lya His Trp Asn Ser Tyr Cys 
195 200 

Lys Ala Leu Thr Met Asp Gly Lys Gin 
210 215 

lie Asp Thr Ala Cys Val Cvs Val Lpu 
225 230 

Ala * 
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Thr Thr Thr His Thr Phe Val 
205 

Ala Ala Trp Arg Phe He Arg 
220 

Ser Arg Ly.^ Ala Val Arg Arg 

235 240 
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(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTEi?ISTICS : 

(A) LENGTH: 744 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

( 1 i ) MOLECULE TYPE : cDNA 

(ix) FEATURE: 

CA) NAME/KEY: CDS 
(B) LOCATION: 1 . . 744 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 

ATG ACC ATC CTT TTC CTT ACT ATG GTT ATT TCA TAC TTT GGT TGC ATG 
Met Thr He Leu Phe Leu Thr Met Val He Ser Tyr Phe Gly Cys Met 
^5 10 15 

AAG GCT GCC CCC ATG AAA GAA GCA AAC ATC CGA GGA CAA GGT GGC TTG 
Lys Ala Ala Pro Met Lys Glu Ala Asn He Arg Gly Gin Gly Gly Leu 
20 25 30 

GCC TAC CCA GGT GTG CGG ACC CAT GGG ACT CTG GAG AGO GTG AAT GGG 
Ala Tyr Pro Gly Val Arg Thr His Gly Thr Leu Glu Ser Val Asn Gly 
35 40 45 

CCC AAG GCA GGT TCA AGA GGC TTG ACA TCA TTG GCT GAC ACT TTC GAA 
Pro Lys Ala Gly Ser Arg Gly Leu Thr Ser Leu Ala Asp Thr Phe Glu 
50 55 60 

CAC GTG ATA GAA GAG CTG TTG GAT GAG GAC CAG AAA GTT CGG CCC AAT 
His Val He Glu Glu Leu Leu Asp Glu Asp Gin Lys Val Arg Pro Asn 

70 75 ^ 80 

GAA GAA AAC AAT AAG GAC GCA GAC TTG TAC ACG TCC AGG GTG ATG CTC 
Glu Glu Asn Asn Lys Asp Ala Asp Leu Tyr Thr Ser Arg Val Met Leu 
65 90 95 

AGT AGT CAA GTG CCT TTG GAG OCT CCT CTT CTC TTT CTG CTG GAG GAA 3 36 

Ser Ser Gin Va. Pro Leu .ilu Pro Pro Leu Leu Phe Leu Leu Glu Glu 
100 135 110 

TAC AAA AAT TAC CTA GAT GCT GCA AAC ATG TCC ATG AGG GTC CGG CGC 184 
Tyr Lys Asn Tyr Leu Asp Ala Ala Asn Met Ser Met Arq Val Arcr Ara 
HS 120 125 

CAC TCT GAC CCT GCC CGC CGA GGG GAG CTG AGC GTG TGT GAC AGT ATT 4 32 

His Ser Asp Pro Ala Arg Arg Gly Glu Leu Ser Val Cys Asp Ser He 
-30 135 140 

AGT GAG TGG GTA ACG GCG nCA GAC AAA AAG ACT GCA GTG GA^ ATG TCG 480 
Ser Glu Trp Val Thr Ala Ala Ar.p Lvs Lys Thr Ala Val Asd Mot ?;e- 

14 5 1 ^ ^ r r. 



144 



192 



240 



288 
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GGC GGG ACG GTC ACA GTC CTT GAA AAG GTC CCT GTA TCA AAA GGC CAA 52 8 

Gly Gly Thr Val Thr Val Leu Glu Lys Val Pro Val Ser Lys Gly Gin 

165 170 175 

CTG AAG CAA TAG TTC TAO GAS ACC A,\G TGC AAT CCC ATG GGT TAC ACA 576 
Leu Lys Gin Tyr Phe Tyr Glu Thr Lys Cys Asn Pro Met Gly Tyr Thr 
1 B 0 ; ^ S 19 0 

AAA GAA GGC TGC AGG GGC AI'A GAC AAA AGG CAT TGG AAC TCC CAG TGC 62 4 

Lys Glu Gly Cys Arg Gly Tie Asp Lys Arg His Trp Asn Ser Gin Cys 
195 20C 205 

CGA ACT ACC CAG TCG TAC GTG CGG GCC CTT ACC ATG GAT AGC AAA AAG 6 72 

Arg Thr Thr Gin Ser Tyr Veil Arq Ala Leu Thr Met Asp Ser Lys Lys 
210 215 22C 

AGA ATT GGC TGG CCA TTC ATA ACG ATA GAC ACT TCT TGT GTA TGT ACA 72 C 

Arg lie Gly Trp Arg Phe Tie Arg He Asp Thr Ser Cys Val Cys Thr 
225 23C 235 240 

TTG ACC ATT AAA AGG GGA AGA TAG 74 4 

Leu Thr lie Lys Arg Gly Arg ■*■ 
245 

(2) INFORMATION FOR SEQ ID NO ; ^' 7 : 

(i) SEQUENCE CHAR/^CTERISTICS : 

(A) LENGTH: 24 8 ammo acids 

(B) TYPE : amino acid 
(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 

Met Thr lie Leu Phe Leu Thr Met Val He Ser Tyr Phe Gly Cys Met 
15 10 15 

Lys Ala Ala Pro Met Lys Glu Ala Asn He Arg Gly Gin Gly Gly Leu 

20 25 30 

Ala Tyr Pro Gly Val Arg Thr His Gly Thr Leu Glu Ser Val Asn Gly 
35 40 45 

Pro Lys Ala Gly Ser Arg Gly Leu Thr Ser Leu Ala Asp Thr Phe Glu 

50 55 60 

His Val He Glu Glu Leu Leu Asp Glu Asp Gin Lys Val Arg Pro Asn 
65 70 75 BO 

Glu Glu Asn Asn Lys Asp Ala Asp Leu Tyr Thr Ser Arg Val Met Leu 
85 90 95 

Ser Ser Gin Val Pro Leu Glu Pro Pro Leu Leu Phe Leu Leu Glu Glu 
ICO 105 110 

Tyr Lys Asn Tyr Lo\i Asp Ala Ala Asn Met Ser Met Arg Val Arg Arg 

115 120 12^ 

His Ser Asp Pro Ala Arg Arg Gly Glu Leu Ser Val Cys Asp Ser He 
130 135 140 

Ser Glu Trp Vnl Thr Ala Ala Asp Lys Lys Thr Ala Val Asp Met Ser 
145 150 15S 160 

Gly Gly Thr Val Thr Val Leu Gl:. Lys Val Pro Val Ser Lys Gly Gin 
1 6 1-^0 1 b 
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T>'r Glu Thr Lys Cys Asn Pro Met Gly T>'r Thr 

185 

Gly lie Asp_:.yc Arg Kis Trp Asn £er Gin Cys 

20: 205 

Tyr Val Arq Ala Leu Thr Met Asp Ser Lys Lys 

Phe He Arg He Asp Thr Ser Cys Val Cys Thr 

230 235 240 

Gly Arg * 



SEQ ID NO : -7 8 : 

(1) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 4 amino acids 

(B) TYPE: ammo acid 

(C) STRTU^EDNESS : 

( D ) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: peptide 

. - ^ — ^^w^.^i J. J. w^'i . oc-v -L-u LHVJ : / a : 

Leu Lys Arg Arg 
1 

(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: unknovvTi 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 

Lys Arg Arg 
1 

(2] INFORMATION FOR SEQ ID NO : 8 C : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SFQ ID NO: 80: 

Leu Lys l.y^ Lys 
1 

(2) INFORMATION FOR SEQ ID NO : 8 1 : 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 41 base pairs 

(B) TYPE: nucleic acid 
(C:> STR.\NDEDKE.^S : smql^^ 
(D) TOPOLOGY: Imear 
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Leu Lys Gin Tyr Phe 
180 

Lys Glu Gly Cys Arg 
195 

Arg Thr Thr Gin Ser 
210 

Arg He Gly Trp Arg 
225 

Leu Thr He Lys Arg 
245 

(2) INFORMATION FOR 
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(xi) SEQUENCE DESCRIPTION; SSQ ID NO : 8 1 ; 
GAACCACACT CAGAGAGCAA TGTCCCTGCA GGACACACCA T t 
(2) INFORMATION FOR SEQ ID NO:82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRAITOEDNESS : c ingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 2 : 
CCCGCCGC^Cr GhGCACACAC ACACAGGCCG TATCTATCCG GATAAA 4 
(2) INFORMATION FOR SEQ ID N0:B3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 3 base pairs 

(B) TYPE: nucleic acid 

(C) STPJU^EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic] 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:83: 
CCGGAAGGCT GTGAGACTTA AGCGGCGGGG TAC 3 
(2) INFORMATION FOR SEQ ID NO: 84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STR.M^EDNESS : single 

(D) TOPOLOGY linear 

(ii) MOLECULE TYPE DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 84 : 
CCCGCCGCTT AAGTCTCACA GCCTT 2= 
(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRATJDEDNESS : single 

(D) TOPOLOGY: linear 

(11; MOLECULE TYPE: DNA (yenomic) 

(XI) SEQUENCE DESCPirTIOrC: SEQ ID NO : 9 5 : 

ATGACCATCC TTTTCCTTAC lATGGTTATr TCATACTTTG GT 4- 

(2) INFORMATION FOR SEQ TO NO: 86: 

(1) SEQUENCE CHAJLi^CTERISTICS : 
(A; LENGTH: 47 base pairs 
(B) TYPr!. nucloic acid 
{ C ) TP AIJD E DN y. r: S ; s 1 n 1 e 
(d; TO?o:.onV: Imear 
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[XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 
GGGACGCGTT TAATGGTCAA TCT/iCATACA CAAGAAGTGC TTATCCT 4-7 
(2) INFORMATION FOR SEQ TD NO : B 

(i) SEQUENCE CKAR.f^CTERI STICS : 

(A) LENGTH; -i^ base pairs 

(B) TYPE: nucleic acid 

(C) STRAtTOEDNESS : smqle 
:D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DKA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 
CGCGGAAGAC TTAAGAJ^GAA ACTGCCGTTC CACCTGCTGT ACGGTAC 47 
(2) INFORMATION FOR SEQ ID NO : 8 8 : 

(i) SEQUENCE CrlAR^^CTERISTICS : 

(A) LENGTH: 3 9 base pairs 

(B) TYPE: nucleic acid 
{C) STRANDEDNESS : single 

(D) TOPOLOGY, linear 

(ii) MOLECULE TYPE DNA {gGnomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 
CGTACAGCAG GTGGAACGGC AGTTTCTTCT TAAGTCTTC 3 9 

(2) INFORMATION FOR SEQ ID NO : 8 9 : 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRAirOEDNESS : single 

(D) TOPOLOGY linear 

(ii) MOLECULE TYPE DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 9 : 
ATGGAGCTGA GGCCCTGGTT GCTATGGGTG 3 0 

(2) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE C1LAR7U:TERIETIC5 : 

(A) LENGTH: f. amino acids 

(B) TYPE: amine acid 

(C) STRANDEDNESS: 

{D^ TOPOLOGY: vinknovm 

(11) MOLECLXE TYPE: peptiae 

(IX) FEATURE: 

{A> IJAME/KEY: F-ept.ido 
(B) LOCATION; S /. 6 

(D; OTHER IHE-jKMATIcr: : /note- "The glutamine and lysine 
residues at thi^ locaticn may be repeated as a unit 1 to 5 
times . " 

(XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 

Gin Gly Pr:: Gly Ly:: 



wo 97/28272 PCT/i;S97/0147(J 

CLAIMS 



1. A fusion proteni comprising three domains joined together in order 
from amino-terminii^ to carboxy-terminus of a first domain comprising a protein of 
interest, a second doiruiin comprising a hydrophilic spacer, and an affinity domain, 
each domain comprismf^ amino acid residues. 

2, The fusion protein of Claim 1 wherein said amino acids of said 
hydrophilic spacer arc susceptible to removal by a means for selective amino acid 
removal. 



3. Tlie fusion protein of Claim 2 wherein said means for selective 
0 amino acid removal comprise*^ a car boxy peptidase. 



4. The fusion protein of Claim 3 wherein said carboxypeptidase is 
selected from the group comprising carboxypeptidase A, carboxypeptidase B and 
carboxypeptidase Y. 



5. The fusion protein of Claim 2 wherein said susceptible amino acids 
of said hydrophilic spcicer are selected from the group consisting of arginine and 
lysine. 



6 The fusion protein of Claim 5 wherein said susceptible amino acids 
of said hydrophilic spuccr have the sequence selected from the group compnsmg 
SEQ ID NOS:irv37. 

7- I he iu^ion protein of Claim 1 wherein said hydrophilic spacer is an 
extended hydrophilic spacer. 
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8. Th£? fusion protein of Claim 7 wherein said extended hydrophilic 
spacer comprises the amino acid sequence of either SEQ ID NOS. IS or 1^ joined 
to the carboxy-tcrminiis of an amino acid sequence selected from the group 
comprising SEQ ID NOS: 16-37 such that said SEQ ID N0S:I8 or 19 are located 
between said SHQ ID NOS:16-.17 and said affinity domain. 

9. The fusion of protein of Claim 1 further comprising a signal peptide 
sequence located at the amino-lerminus of said fusion protein and joined to said 
first domain. 



10. 1 he fusion protein of Claim 9 wherein said signal sequence is 
10 sequence of SEQ ID N0:61. 



11. The fusion protem of Claim 1 further comprising an endoprotease 
recognition sequence joined to said second domain between said second domain 
and said affinity domain. 

12. The fusion protein of Claim 1 further comprising a CPB terminator 
15 joined to said first domain comprising said protein of interest between said first 

domain and said second domain comprising said hydrophilic spacer. 

13. The fusion protein of Claim 11 further comprising a penultimate 
enhancer joined to said second domain comprising said hydrophilic spacer and 
between said second domam and said endoprotease recognition sequence. 

20 14. A rcconibinant DNA vector having a nucleotide sequence encoding a 

fusion protein comprising three domains joined together in order, from amino- 
tcrminus to carboxy-ti i minus, ot a first domain comprising a protein of interest, a 
second domain conipi .sing a hydrophilic spacer, and an affinity domain, each 
domain compnsinLi amino acid residues. 
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15. The recoinbinant DNA vector of Claim 14 wherein said amino acids 
of said hydrophiHc spacer arc suscepiiblt: to removal by a means for selective 
amino acid removal 



16. The rccombiiKint DNA vector of Claim 15 wherein said means for 
5 selective amino acid removal comprises a carboxypeptidase. 

17. The recombinant DNA vector of Claim 16 wherein said 
carboxypeptidase is selected from the group comprising carboxypeptidase A. 
carboxypeptidase B diul ciirboxypeptidase Y. 

18. The recombinant vector of Claim 15 w^herein said susceptible amino 
0 acid.c of said hydropliilic i;pacer arc selected from the group coiiMsling of arginine 

and lysine. 



19. A metliod of producing authentic recombinant proteins of interest, 
comprising: 

a) providing: 

i ) a recombinant DNA vector encoding a fusion protein 
comprising three domains joined together in order from amino- 
termiiuis to carboxy-terminus of a first domain comprising a protein 
of interest, a second domain comprising a hydrophilic spacer, a third 
domain comprising an endoprotease recognition sequence and an 
affiniiv domain, each domain comprising amino acid residues; 

ii) host cell suitable for expressing said fusion protein 
encoded by said recombinant DNA vector; 

iii) an endoprotease capable of cleaving said fusion 
proteip. Nviih.in said endoprotease recognition sequence; 

re ) an affinity resin capable of interacting with said 
affinity domain on said fusion protein: and 

\ ) a means for removing non-authentic amino acids from 
said liisl domain comprisinc said protein of interest; 
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b) iiuroducing said vector into said host cell under conditions 
such that said fusion protein is expressed; 

c ) purifying said expressed fusion protein by means of 
interaction of ^aid affinity domain on said fusion protein with an affinity 
5 resin; 

d) cleaving said purified fusion protein with said endoprotease to 
generate a released protein of interest; and 

c) removing any non-authentic amino acids present at the 
carboxy-lcrminus of said released protein of interest with said removal 

10 means to produce an authentic protein of interest. 



20. The method of Claim 19 wherein said removal means comprises at 
Icusi one carboxypepiidase and said removal comprises contacting said released 
protein of interest with said at least one carboxypeptidase under conditions such 
that said non-autheniic amino acids are removed to generate said authentic protein 

15 of interest, 

21. The method of Claim 19 wherein said affinity domain comprises a 
portion of the Fc domain of human IgGl. 

22. The method of Claim 21 wherein said affinity resin is selected from 
the group comprising protein A and protein G. 

20 23. The method of Claim 19 wherein said affinity domain comprises a 

portion of the protein glutalhione-S-transferase. 

24. The method of Claim 23 wherem said fusion protein is purified on a 
glutathione resin 

25. The n^ethod of Claim 19 wherein said affinity domain comprises a 
25 portion of the m;i!tose binding protein. 
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26. Ttic method ot Claim 25 wherein said fusion prolein is punfied on 
an aniylose rcsii:. 



27. The method of Claim 19 wherein said affinity domain comprises a 
ponion of the st.:f^hylococcal protein A. 

5 28. riie method of Claim 27 wherein said fusion protein is purified on 

an IgG resin. 

29 The method of Claim 19 wherein said affinity domain comprises a 
portion of the protein P-galactosidase. 

in TK^ n^otU^^ ...i : j c..^:^- _ _ ^ • • • r- I 

* iiiwvi»vy*-i vji v^iajjii ^7 wjicicin SdlU lUilUJi piULCIll 15 puriilCCI On 

10 p-aminophenyl-P-D-thiogalactosidyl-succinyldiaininohexyi-Scphahrose. 
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FIGURE 10 
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FIGURE 12 
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FIGURE 14 
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FIGURE 16 
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Met-Leu 


Phe-Leu 
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Met-Phe 
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ATX3 TCC ATC TTT, TTV^ TAC ACT CTG ATC AGA GCT TTV CI'G KVC GGC ATA CA3 GCG 

TAC AGG TAC AAC AAG ATG TCA GAC TAG TCT CGA AAA GAC TAG COG TAT GTC CGC 

►Met Ser Met, Leu Phe Tyr Thr Leu lie Thr Ala Phe Leu lie Giy He Gin Alr» 
-100 .9P 

GAA CCA. CAC TCA GAG AGC AAT GTC CCT GCA GGA CAC ACC ATC CO:^ CAA GT^ CA.C 

nr r/rv crr^ hcrr rir rcr; tta cag gga ott rrr nm tgg tag gg.;^ gtt cag ok; 

rG\u Pro Ills Ser Glu Ser Asn Va 1 Pro Ala Gly His Thr II Pro GJn Va I His 
-80 .70 



TGG 


ACT 


AAA 


CIT 


CAG 


CAT 


TCC 


CTT 


C^C 
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GCC 


CTT 


CGC 


AGA 


GCC 


CGC 


AG'C 


GCC 


ACC 


TGA 




GAA 


GTC 


GTA 


AGG 


GAA 


CTG 


TGA 


CGG 


GAA 


GOG 


TCT 


CQ5 


GCG 


1VG 


CGG 


► Trp 


Thr 


hys 


Leu 


Gin 


His 


Sar 


Leu 

-60 

CGC 
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Thr 


Ala 


Leu 


Arg 


Arg 


Aid 


Arg 


Ser 


Ala 
-50 

GTG 


CCG 


GCA 


GOG 


GOG 


ATA 


GCT 


GCA 


GTG 


GOG 


GGG 


CAG 


ACC 


CGC 


AAC 


ATT 


ACT 


GGC 


CGT 


CGC 


CGC 


TAT 


CGA 


CGT 


GOG 


CAC 


CGC 


CCC 


GTC 


TGG 


GCG 


TTG 


TAA 


TGA 


CAC 


► Pro 


Ala 


Ala 


Ala 


He 


Al<i 


Ala 


Ar g 


Val 


Ala 
-40 

CTC 


Gly 


Gin 


Thr 


Arq 


Asn 


I 1 e 


Thr 


Val 


GAC 


CCC 


A05 


CTG 


TIT 


AAA 


AAG 


OGG 


CGA 


CGT 


TCA 


CCC 


CGT 


GTC 


CTG 


T7V 


AGC 


CTG 


GGG 


TCC 


GAC 


AA.^ 


TTT 


TTC 


GCC 


GCT 


GAG 


GCA 


AGT 


GGG 


GGA 


CAC 


GAC 


PJiA 


TOC 


► As p 


Fro 
-30 


Aig 


Leu 


Phe 


Ly s 


Lys 


Arg 


Arg 


Leu 


Arg 


Ser 

-20 

GAT 


Pro 


Ar g 


Val 


Leu 


Phe 


Ser 


ACC 


CAG 


CCT 


cor 


CGT 


GAA 


GCT 


GCA 


GAC 


ACT 


CAG 


CTG 


GAC 


rvc 


GAG 


GTC 


GGT 



TGG GTC GGA GGG GCA CTT CGA CGT CTG TGA GTC CTA GAC CTG AAG CTC CAG CCA 

►Thr Gin Pro Pre Arg Glu Ala Ala Asp Thr Gin Asp Leu Asp Phe Glu Val Gly 

-IC' -l"^ 

GGT GCT GCC CCC TPC AAC AGG ACT CAC AGG AGC AAG CGG 

CCA CGA CGG GG2 AAG TTG TCC TGA GTG TCC TOG TTC GCC 

►gIv Ala Ala Pr o Phe Asn Ara Thr Hisl Arn .qpr Lv^ At 



I 

TCA TCA TCC CAT CCTJ 
AGT AGT AGG GTA GGG 
Ser Ser Ser His Pro 
20 



ATC TTC CAC AG^, GGC GAA TTC TCG GTG TGT GAC AGT GTC AGC GTG TGG GTT GGG 

TAG AAG GTG TO: CCG CTT AAG AGC CAC ACA CTG TCA CAG TCG CAC ACC CAA CCC 

►lie PhG Hm Arg Gly Glu Phe Ser Val Cys Asp Ser Val Ser Val Trp Val Gly 

3 0 4 0 

GAT AAG ACC AO: GCC ACA G^^.C ATC A.^G GGC AAG GAG GTG ATG GTG TTG GGA GAG 

CTA TTC TOJ ll^J CGG ^l^GT CTG TAG TTZ CCG TTC CTC CAC TAC ChC AAC CCT CTC 

►asp I.ys Thr Thr Aid Thr Asp He Lys Gly Lys Gl-j Val Mf?t: Val Leu Giy Glu 

SO 

CTG AAC ATT AP.C AAC AGT GTA TTC AAA CAG TAC TTT TTT GAG ACC AAG TGC 03G 

CAC T?G TA;^ TTG TTG TCA CAT AAG TTF GTC ATG AAA AAA CTC TGG TTC A03 C/X* 

►Val Asn lie Asn Asn Ser Val Phe Lys Gin Tyr Phe Phe Glu Thr Lvs Cyn Aro 

6 0 7 0 ' ~' 

GAC CCA AAT C^C GTI GAC AGC GGG TGC CGG GGC ATT GAC TCA AAG CAC TG(3 AAC 

CTG GGT TTA GGG CA.A Ci\; TCG CCC A02 GCC CCG TAA CTG AGT TTC GTG ACC TTG 

►asp Fro Asn Pro Val Asp Sei Gly Cyz Arg Gly He Asp Ser Lys Trp A^r 
BO 9 0 

TCA TAT TGl^ ACC ACG ACT CJ\C ACC TTT CTR- AAG C<^ CFG ACC ATC GAT (3G<- AAG 

AGT AlA ACA TG':j 'IlyC TG/i GTG TCG /iAA CAc; TTC CGC GAC TGG TAC CCA 'CCV^ TK: 

►Ser Tyr Cy. T:.r T);i Thr Hi;: Thr ?h - Va! A]a Le-J T/;r Met Asp Gly Lys 

10 0 lie 

CAJG CCT GC'C T'"Ot 'T>'^ CI11!I_^AT!C AT'- fi^T ACT ^ 'CCC T^T CTC TCT GTC CTC LGC 

I 'CTC (CO- aaa ta^ c^c:; ta-: lta ':vsc aca 'Ca:: acc. :cac gag ^cc 

j ►cln Ala AH . r f A - 5 i'he H Arg Ho k:.y Thr A 1 . j Cy£ V.i". Cy^ Val L*^u S tn 



AC^ ?J^G Qcr CTx;; aga 

TCC TCC ;c\C 7CT 



► Ar 
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AC^^ C-C:^ TGA 

TCT :y:^G Ac-r 

A r- A 1 o • • • 
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- 1 2 8 1 20 - I 

ATG ACC KIV CTT TTC CTT ACT ATC GTT ATT TCA TAC TTT GGT TCC ATX^ AAG CCT CCC' ATG 

TAC TCG TAG GAA AAG GA.^ 7^ ThC CAA TAA ACTT ATC AA.^ CCA ACG TAC TTC CGA LXjC, GTJtJ TAC 

► net Thr I.b L t 'j Phfc Lfea Thr Mot Vn 1 lie Ser lyr Phe Gly Cyr; McL L^ys Al<i Aid Fio Me- 



-100 

AAA CJV/v GO\ /vAC ATC CGA GGA CAA GGT GGC 
TTT CTT OGT TIG TAG GCV CCT GTT CCA CCG 

► Lys Sill A.^ Asn I)e Arg Gly Gin Gly Gly 

80 

CTG GAG AGC GIG AAT OOG COC AAG GCA GGT 
GAC CrC TCG C/C TTA CCC GGG TTC CGl^ CCA 
►beu Glu Sor Val Asn Gly Pro Lys Ala Gly 

GAA CAC aVG ATA GAA GAG CTG TIG GAT GAG 
CTT^ (JlXi CAC TAT CTT CTC GAC AAC CTA CTC 

► ciu His Val He Glu Glu Lei: Leu Asp Glu 

-40 



TTG GOC TA: CCA 

AAC COG ATG GGT 

Leu Ala Tyr Pro 

TCA AGA GGZ TTG 

AGT TCT CCG AAC 

Ser Arg Gly Leu 

GAC GAG AAA GTT 

CTG GTC TTT CAA 

Asp Gin Lys Val 



Ncol 

GGT GTG COG ACC^CAT GOG ACT 

CCA CAC GCC T3G GTA CCC TGA 

Gly Val Arg Tnr H i 1. 1 y Thr 

-70 

AC^ TCA TTG GCT GAC ACT TTC 

TCrr AGT AAC CGA CTG TG^v AAG 

Thr Ser Leu Ala Asp Thr Phe 

-50 

CGG COC AAT GAA GA.\ AAC AAT 

GCC GGG 1TA CTT CTT TTG TTA 

Arg Pro Asn Glu Glu Asn Aon 

-30 



AAG 


GAC 


GCA 


GAC 


TIG 


TAC 


ACG 


TCC 


AGG 


GTG 


ATG 


CTC 


AGT 


AGT 


CAA 


GTG 


CCT 


TIG 


GAG 


err 


CCT 


TTC 


CVG 


CGT 


CTG 


AAC 


ATC 


TGC 


AGG 


TCC 


CAC 


TAC 


GAG 


TCA 


TCA 


GTT 


CAC 


GGA 


A^c 


CTC 


GGA 


GGA 


► Lyi; 


Asp 


Ala 


Asp 


Leu 


Tyr 


Thr 


Scr 


Ar g 


Val 


Met 


Leu 


.Sp r 


Se r 


Gin 


Val 


Pro 


Leu 


Glu 


Pro 


Pro 


CTT 


CTC 




- 20 

CTG 


CTG 


GAG 


GAA 


TAC 


AAA 


AAT 


TAC 


CTA 


GAT 


- 1 0 

GCT 


c<l^ 


AAC 


ATG 


TCC 


ATG 


AOG 


GTC 


GAA 


GAG 


AAA 


C^C 


GAC 


CTC 


CTT 


ATG 


TTT 


TTA 


ATG 


GAT 


CTA 


CGA 


CGT 


TTG 


TAC 


AGG 


TAC 


Tcr 


CAG 


► Leu 


Leu 


Phe 


Leu 


Leu 


Glu 


Glu 


Tyr 


Lys 


Asr. 


Tyr 


Leu 


Asp 


A I a 


Ala 


Asn 


Met 


Ser 


Met 


Arg 


val 



G3G 


OGC 


I 

CA.C 


TCT 


GAC 


OCT 


occ 


CGC 


OGA 


GGG 


GAG 


1 0 

CTG 


AGI 


GTG 


TGT 


GAC 


AGT 


ATT 


AGT 


GAG 


TGG 


GCC 


GCG 


GTG 


AGA 


CTG 


GGA 


COG 


GCG 


Gcr 


COC 


CTC 


C>AC 


TCG 


CAC 


ACA 


CTG 


TCA 


TAA 


TC^ 


CTC 


ACC 


► Arg 


Arg 


His 


Ser 


Asp 


Pro 


Ala 


Arg 


Argl 


Gly 


Glu 


Leu 


Scr 


Val 


Cys 


Asp 


Ser 


lie 


Ser 


Glu 


Trp 
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GTA 


AOG 


GOG 


GCA 


GAC 


AAA 


AAG 


ACT 


GCA 


GTG 


GAC 
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GGC 


OGG 


AOG 


GTC 


ACA 


GTC 


err 
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CAT 


TOC 


OGC 


CGT 


CTG 


TIT 


TTC 
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OGT 
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CTG 


TAC 


AOZ 
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TOC 


CAG 


TG^r 


CAG 


GAA 


CTT 


► Val 


Thr 


A La 
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Asp 


Lys 


Lys 


Thr 


Ala 


Va 1 


Asp 


Met. 
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Gly 


Gly 


Thr 


Val 


Thr 


Val 


Leu 


Glu 





GTC 


CCT 


GTA 


TCA 


AAA 


GGC 


CA/\ 


CTG 


5 0 

A^G 


CAA 


TAC 




TAC 




ACC 


AAG 


TGC 


AEvT 
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TTC 
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CA.r 


ATT 


TTT 


CCG 


GTT 
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TTC 
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ATC 
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ACG 


TTA 


GGG TAC 


► Lys 
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Tyr 
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Thr 


Lys 
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TCG 
Thr 



TAC 

ATG 
Tyr 



CAG 

GTC 

t; 1 n 



7 0 

K-Jk CJu\ GGC TGC GGC ATA 

':iT err ocg aog aoc ccg tat 

Lys G .u Gly Cys Arg G 1 >' lie 



GAC AAA A03 CAT 
CTG TTT TO:^ GTA 
Asp [Lvi: Arg] H i s 



8 0 

TGG AAC TCC CAG TGC CGA ACT 

ACC TTG AGG GTC ACG C/TT TGA 

Trp Asn Sc-r Gin Cys Arg Thr 



Aa: 



TAC 


GTG 


CGG 


GCC 


9 0 

CTT 


Ncol 

ACT at:; 


GAT 


AGO 


A1G 


GAC 


GCC 
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TCG TAC 


CTA 


Ton 


Tyr 


v.-i 1 


Arg 


Ala 


Leu 


Thr t'.Q^ 


Asp 


Ser 
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^"pfi " [GT AC" 
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rh r lie [TyTArg 



1 I *> 

vGGA AGA 
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TAG 
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